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Chapter 1 

INTRODUCTION 


Statistical methods are concerned with the reducing of either large or 
small masses of data to a few convenient descriptive terms and with the 
drawing of inferences therefrom. The data are collected by any of severa 
methods of research with the aid of measuring devices appropriate to a 
given area of investigation. The research methods are variously named and 
classified Thus in psychology we have methods which are labeled experi¬ 
mental, clinical, observational, etc. The devices for measuring or securing 
responses vary from those which involve delicate apparatus thiough 
paper-and-pencil schemes to controlled observations and interviews^ 
Statistical techniques are not to be considered as coordinate either with 
research methods or with devices for obtaining and recording responses, 
but rather as tools for analyzing data collected by whatever means. 

The reduction of a batch of data to a few descriptive measures is the part 
of statistical analysis which should lead to a better over-all comprehension 
of the data. All readers will be more or less familiar with the concept of 
average An average is a measure which describes what is typical of a 
group with respect to some trait, characteristic, or variable. Ifwe are 
comparing two or more groups, the determination of an average for each 
group permits a better appraisal of possible group differences than would 
be obtained by casual examination of the data. There are various statistical 
measures, or types of averages, which have proven useful as descriptive 
terms for a variety of data. One aim of this book is to present and discuss 
the descriptive statistical measures most frequently needed m psychological 
research. Proper usage and interpretation of these terms and evaluation 
of their use by others are not possible without knowledge of their meaning 
and their limiting assumptions. Incidentally, the user of statistical measures 
must give some thought to computational procedures. 
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measures Initako t *‘ri ?? * • , n< ;f cssar >' not only to define descriptive 
measures but also to distinguish between the usage of a given measure as 

being descriptive of a sample as opposed to a population. Since sample 

inTncfnuTt 1Stl | S ^ kn ° W " S (Le - com P utable ) whereas the correspond¬ 
ing population values are unknowns (but estimable), we will in thisbook 

define and discuss the descriptive measures in terms of samples and sub¬ 
sequently consider the problem of drawing inferences about, or estimating 
population values. Sample values are frequently referred to as statistics 
and population values are called parameters. 

That part of statistical analysis which has to do with the drawing of 
inferences is imposed on us because of certain inadequacies of resefrch 

hekht : irr C " a " T Stigat0r Wh ° Wi8heS t0 know the 
height of adult women in the United States will never have facilities for 

measuring every woman. Accordingly, he is compelled to measure a 

sample of women; then on the basis of information yielded by the sample 

Jf women An n ot 1 h enCe t COnCerning ^ aV6rage hei 8 ht the population 
of women Another investigator, wishing to evaluate the relative merits of 

two learning methods, tries out the methods with two small groups of 

students and from the results, makes an inference concerning what might 

be expected if he had facilities for working with very large groups An 

pinion poller may seek information about the reactions of Republicans 

and Democrats to some world event. By questioning a sample of each 

no?-W Vff" SeCU u SUffldent data f0r drawin S an inference regarding a 

'o“ e “ n lhe of Rep " b “““ '*= r«pL 

St Jkt,vT Wem n f:StatisticaI inference is usuallythat,of determiningwhether 
statistical significance can be attached to results after due allowance is 
made for known sources of error. There are many and varied situations 

available hnetr^ t6StS ° f S1 g nifican ce, and accordingly several tests are 

do no^ nnje S h” n mferenC6S Cami0t be made b T those who 

o not understand the purposes, assumptions, and applicability of the 

various techniques for judging significance. 

It is in connection with the problem of drawing inferences that a 

PteMedhi 6 ° f StatlStlcal metbods is most helpful. A research should be 
p aimed in such a way that the resulting data are amenable to treatment by 

these? f 6 Statl ? Cal techniques. With sufficient information concerning 
these techniques of analysis, one should be able to lay outin advance of date 
collecting the main types of statistical analysis to be used. If a proposed 
experimental setup precludes the possibility of adequate analysis, ifmay 
ton f UD ?i a , S Ight aItera ti°n m the plan will remedy the situation. All 

been cX? H is called in to hel P with data which have not 

been collected in such a manner as to permit efficient analysis. Only by 
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knowing the available methods of analysis can one plan a research with 
assurance that the results can be handled statistically. 

Another reason for keeping in mind statistical considerations w 
planning a research is the fact that some experimental designs are prefer¬ 
able because they permit, with small additional cost, or even at a saving, 
better control of error than other plans. Indeed, certain designs lead to a 

marked reduction in known sources of error. , 

A third reason for planning with foresight regarding the statistical 
analysis is that a set of data can sometimes be made to serve for checking 

several different hypotheses. , . , -r 

The student should be warned that he cannot expect miracles to be 
wrought by the use of statistical tools. Although statistical methods have 
an important place in present-day psychological research it does not 
follow that they can be utilized to salvage data that result from a hap¬ 
hazardly planned and sloppily executed investigation No amoun of 
statistical juggling can transfigure bad data into acceptable form. It 
doubtful whether the student who comes to the statistician with a bate o, 
data and the question, “Can I compute a correlation coefficient. . . 

will make a scientific contribution, but such a student deserves sympathy 
especially if his major advisor has suggested that he need not worry about 

statistics until he has collected data. . . , 

The purpose of the present book is to acquaint the student with the 
statistical techniques commonly used, to suggest economical computa¬ 
tional procedures, and to state the assumptions and limitations of the 
various techniques. Whenever the understanding of a particular tech¬ 
nique can be clarified by a simple derivation, such a derivation will be 
given. Unfortunately, many of the derivations are too complicated 
mathematically to permit consideration in an elementary or intermediate 
treatment. The qualified and interested student will find some of these 
derivations in more advanced textbooks and others in original sources^ 

Statistical methods belong in the realm of applied mathematics and 
consequently extensive scholarship in mathematics is required of those 
who choose to specialize in statistics. It is possible, however to secure a 
practical working knowledge of statistical techniques without first becom¬ 
ing a mathematician, provided the deficiency in mathematics is not 
accompanied by an emotional reaction to symbols. 

Within the realm of psychological research there is wide variation m the 
need for statistical procedures. We can find current research reports which 
involve no use of statistics, some which involve very simple statistical 
treatment, still others which lean heavily on the tools of statistics and a few 
which are highly statistical. We need not shift from one area of investiga¬ 
tion to another to find this variation, but it is true that certain areas of 
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procetoes" P ^f° lo ^ f have ' e f dependency than others on statistical 
procedures. The area of psychology which seems the most dependent on 

statistics is psychological measurement. This dependency is due mainly to 
p,,c “ 8fa ■ , — “• - -22 

The presence or absence of statistical analysis per S e is not a safe 
criterion for judging the worth of a study-some studies would have been 

hev'hld been *f ° f statistics > where *s others would be better if 

they had been so designed as to depend less on statistical analysis. Except 

or the requirement that the statistical analysis be adequate, there are no 
general rules as to how statistical a research should be. Of two experi- 
“ f nS ;f her ° f Which wo *dd provide appropriate data for checking 

sta S tTstL^ yP0 l h6S1S - 0r SetS ° f hy P otheses ’ *at plan which calls for simplf 
statistical analysis is certainly preferable to the one which requires elabor- 

EXp "" M " of'™ » f.r b,*, ,1 



Chapter 2 

TABULAR AND GRAPHIC 

METHODS 


When we are faced with a mass of data, the first mani P ul f^ e S [ e , P ** 

• i Tf wp are dealing with the number of children 

tabulate our 1000 into 21 different inch groups. If we also know the 

SSSS33£S2«Es 

tabulations will show marked differences as we pass from trait to tram 

Wm^m §! 

^For most^ purposes it is adequate if we tabulate, or classify, indmduals 

£ 
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persons ^w^funfgXps^oTuch 10 ’ of c,ass ' , > in S our m, *r 

ci—, .he™ 1 ;,,, sz ™;:% s 7Z’!w rr'r 

but not more than 70 , Such aS t0 P ermit at Ieast 10 or 12? 

Table 2.1. Frequency distribution of IQs for 161 five-year-old boys 

Interval _ / Smoothed f Cumulative/ 

160-169 1 .3 161 

150-159 1 3 60 

140 ' 149 3 4.0 J60 

13 °-* 39 9 13.7 57 

120-129 29 25.7 ,48 

HO-119 39 34.3 U9 

100-109 35 35.3 8 0 

9 °-" 32 25.0 45 

8 °-89 8 14.0 I? 

70-79 2 37 , 

60-69 1 , 3 , 

50-59 l ,.o 

40-49 , 7 j 

f=;issi===~t= 

dots or tally marks when tabulating Thp fair ^ • Se eit ^ er 

counted and recorded to the rish of the i n f" mterVaI Can be 

individuals in all the grouping intervals tL,?, c 7 nUmber of 
•r“no",'’."s'!" ““m ” l ^ J 

rsyrbVf.r-' 

139.5, but ,f the ages of individuals have been taken as at the last 
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birthday, the interval 20-24 would have actual limits of 20 and 24.999 + . 
Obviously for purposes of tabulation we need not use the implied actual 
limits, and for computational purposes we usually need either the lower 
limit or the midpoint of certain intervals, so there is nothing to be gained 
by meticulously labeling the intervals with actual limits. 

GRAPHIC PRESENTATION 

If we scrutinize the tally marks or the frequency table we can obtain 
some notion as to how the individual values are distributed. A number of 
pictorial schemes have been suggested as aids in the study of frequency 
distributions. It is possible to lay off the various values (or intervals) of 
the variable on the horizontal or x axis, and to let the vertical or y _axis 
represent the frequency per value or interval. The frequencies of the 
several intervals can be represented by drawing a horizontal 
each interval at the height corresponding to the number of cases m that 
interval, and then connecting these horizontals with verticals erected at the 
interval limits. This yields a histogram (Fig. 2.1). Using the same arrange¬ 
ment of the vertical and horizontal scales, we can merely'indicate: he 
frequency with a dot or cross placed directly above the midpoint of the 
interval, and then connect the adjacent points with straight lines. T 
results in a frequency polygon (Fig. 2.2). Such a polygon or e corre 
sponding histogram will usually show irregularities; on the assumption 
that these are due to the operation of chance, we can draw a s «‘ 00th h ^ r v ®’ 
cutting as near the points as possible, and this curve can be thought of as 
giving 8 a better picture than the original polygon. A curve which 
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Fig. 2.1. Histogram for data of Table 2.1. 
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obtained by freehand drawing or by graphic smoothing schemes or bv 

kMwn Sm °° thmg 0f the fre quencies by a method of moving averages il 

in Table ai'Twhk°" e meth ° d ° f m ° Ving aV6rageS is il,ust ^ ted 
,1 , ’ ^ an average is taken over three intervals The 

smoothed value for an interval is obtained by summing the frequencies in 

that interval and the two adjacent intervals and dividing by 3 q Thus the 

Tstf32 dfvS t p al V 9 iS equal 10 the - of "he frequ^ h el 
’ a J t T, dlvided by 3 - For the 90 interval, 8, 32, and 35 are summed 

frequencies^scfas to^ pI ° £ b ° th the 0ri 8 inal and “hed 

irequencies so as to compare the two graphs 

histogram'^ ea * y t0 d eP*t a'frequency distribution by a 

istogram, by a frequency polygon, or by a smoothed frequency curve 
it is necessary that we note a shift in interpretation as we Ls from di^ 

areineffte tV 6 poIygon t0 the curve - In drawing the histogram, we 

anv two th a f aWm d a S6neS ° f VertiCaI bafS Wkh a C ° mmon b °undary f or 
any two that are adjacent to each other. Since the height of each bar 

of P eT S hT a fre<JU , ency ’ we may > by arbitrarily assigning unity as the width 
of each bar say that the area of a bar also represents a frequency Then 
thesum of the areas of the several bars will be the total number of cases" 

his^tOCTam'of^Fh tbe P°^ y S on m Fig. 2.2 as being superimposed on the 
ustogram of Fig. 2.1 and imagine that the common boundaries of the 

hen H. rei —g parts of the bars have the appearance of an up and 
then down irregular staircase. A little thought should convince the reader 



9 


[ 2 ] TABULAR AND GRAPHIC METHODS 

that the total area under this staircase is N, or precisely the same as the 
sum of the areas of all the bars. 

Next consider the polygon. Note that as we pass from interval to 
interval, the polygon in conjunction with the staircase histogram forms a 
series of pairs of equal-area triangles. One of each pair is an area included 
under the polygon but not under the histogram, whereas the other is an 
area included under the histogram but not under the polygon. The net 
effect of this balancing of areas, in and out, is that the total areas under the 
polygon and histogram are equal; each total area represents N. 

Now it should not stretch our imagination too much to regard the total 
area under a smoothed polygon or under a frequency curve as being equal 
to N. With this notion that area, not height, represents frequency, we can 
readily speak of the area under the curve between ordinates erected at any 
two score values on the base line (x axis) as the number of cases between 
the two score points. And of course the area under any part of the curve 
could be expressed as a proportion or a percentage of the total area, 

This concept of area as frequency will have considerable value for us as 
a basis for interpreting certain statistical measures, and the concept will be 
indispensable to our understanding of certain “ideal,” or mathematical, 
frequency curves, as yet undefined. 

Another type of graph can be obtained by the use of cumulative fre¬ 
quencies. In Table 2.1is a column headed “Cumulative f. These values 
are obtained by successive adding of the frequencies, beginning with the 
lowest interval. Adding 1 and 1 gives 2, adding to this the next frequency 
gives 3, to which in turn is added the next, giving 5, and so on until wq have 
160 plus 1 for the last cumulative value, which is the total number of cases. 
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Fig. 2.4. Ogive for data of Table 2.1. 

Obviously, from the cumulative table we can tell how many individuals 
fall below a given point. If we plot the cumulative values and connect the 
plotted points, an ogive curve results (Fig. 2.4). Note that, in plotting the 
cumulative frequencies, we do not use the midpoint of the interval, but 
rather the upper boundary. Why ? 

The use of frequency polygons in the comparison of two groups is quite 
simple and often very enlightening. All that is necessary is to plot the data 
for both groups on the same sheet and with reference to the same axes. If 
the number of cases in the two groups differs markedly, a better com¬ 
parison can be obtained by converting the frequencies for each group to 
percentages of the total number in each group. Polygons based on per¬ 
centage frequencies will not portray differences which are merely a reflection 
of differing Ns and therefore are more comparable. A glance at two such 
frequency polygons will reveal whether the two groups show marked 
differences in the trait in question or to what extent the two distributions 
overlap. More refined methods for comparing groups are discussed later. 

When we wish to picture a discrete series, it is customary to use either 
horizontal or vertical bars, separated from each other, to represent the 
several frequencies. As in the case of frequency polygons and histograms, 
there are no hard and fast rules regarding the heights (or lengths) of the 
bars relative to the horizontal (or vertical) base. The student should 
attempt to avoid extreme lack of proportion. Newspapers and magazines 
often represent frequencies as areas or solids. A circular diagram, or 
pie chart, in which the sizes of the separate sectors represent the percentage 
falling into given groups or classes is sometimes used to picture relative 
frequencies. There is some evidence, and a general consensus of opinion, 
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that some type of linear graph is less likely to be misinterpreted than one 
that depends on areas or solids. 

Another type of graphical representation is used to picture the relation¬ 
ship between two variables, e.g., growth in stature and age, or price 
change with year. To make such a line graph, we can lay off time or age or 
trials on the horizontal axis, choose a convenient scale on the y axis for the 
other variable, and then plot the observational values. The line graph 
should be arranged so that the graph is read from left to right and from the 
bottom to the top, and the scales on the two axes should allow the inclusion 
of all observed values of the two variables and at the same time permit of a 
well-balanced or well-proportioned picture. A line graph can be made 
misleading by the choice of the scales on the two axes. For instance, if we 
are plotting the practice curve for card sorting (number of cards sorted on 
y axis, trial number on x axis), it is possible to make a tremendous differ¬ 
ence in the appearance of the graph simply by altering the scale on the y 
axis. Of two curves which represent the same relationship, one (Fig. 2.5) 
would give the impression that the learning had progressed quite rapidly, 
whereas the other (Fig. 2.6) would lead us to think that progress was slow. 
The student will do well to develop a healthy scepticism of all graphs he 
encounters for the simple reason that either scale can be so selected as to 
lead to gross misinterpretation. 

It should be noted that smoothing may be applied to line graphs as well 
as to frequency polygons. Often, if a line graph is smoothed, the relation¬ 
ship between the two variables can be more adequately characterized. 



Fig. 2.5. Learning curve (same data) Fig. 2.6. Learning curve (same data 
as Fig. 2.6.) as Fig. 2.5.) 
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Smoothing out the irregularities helps us to see whether the relationship 
is linear or logarithmic or parabolic or of some other common type. 
Frequently a verbal description of a curve will aid in understanding 
something of the functional relatedness of the two variables. To state a 
relationship in more exact mathematical language involves the application 
of some form of curve fitting by which the constants of the equation can be 
determined. 

The student who is interested in a complete discussion and treatment of 
graphic methods is referred to books on the subject by Brinton and by 
Arkin and Colton.* 

* Brinton, W. C., Graphic presentation, New York: Brinton Associates, 1939; Arkin, 
Herbert, and Colton, R. R., Graphs , how to make and use them, New York: Harper, 
1936. 


Chapter 3 

DESCRIBING FREQUENCY 
DISTRIBUTIONS 


It has been implied in Chapter 2 that a variable, such as height, IQ, or 
reading ability, can be represented by X, where X takes on various values, 
i.e., varies from individual to individual. Obviously, X is not used here to 
represent an unknown but rather as a symbol for any of several known 
quantities. When a frequency polygon is drawn and smoothed, it is often 
found to be a curve which has a peak or maximum near the center of the 
Xs and drops off gradually toward the base line or x axis on either side 
of the point of maximum value. In other words, a typical frequency curve 
(or polygon) or a frequency distribution can be roughly characterized as one 
which shows four chief features: a clustering of individuals toward some 
central value, dispersion about this value, symmetry or lack of symmetry, 
and flatness or steepness. Many variables or traits yield distributions 
which are said to be approximately bell-shaped, but such a description is 
not adequate for scientific purposes. We want to know about what 
particular value and with how much scatter the individual scores are dis¬ 
tributed, to what extent the distribution is symmetrical, and to what 
degree it is peaked or flat. That is, we need measures of central value or 
tendency, measures of scatter or dispersion or variability, and measures of 
skewness (lack of symmetry) and of kurtosis (peakedness or flatness). 
With such measures, we can describe the distribution mathematically, and 
in such a way that a statistically trained contemporary, say in Melbourne, 
can picture to himself the frequency distribution. 

Thus we are led to a consideration of the various measures of central 
value, dispersion, skewness, and kurtosis. It is adequate and usually more 

13 
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economical of time to determine these measures from frequency distribu¬ 
tions rather than from the original undistributed scores. Since the compu¬ 
tation of the descriptive terms frequently involves a determination of the 
lower limit or midpoint of a class interval, the student should recall what 
has been said about actual and expressed class limits. Obviously, if we 
need the midpoint of an interval, it is necessary only to add one-half the 
size of the interval to the actual lower limit, which must be determined by a 
consideration of the nature of the scores or measures which constitute the 
variable. Psychological measurements and test scores are usually treated as 
though rounded to the nearest value. 

MEASURES OF CENTRAL VALUE 

The mode. A glance at a typical frequency distribution will indicate to 
us the most frequently occurring X value, or for grouped data the group of 
A values which has the greatest frequency. This maximal frequency 
roughly defines the mode. For nongrouped data the mode is the X value 
having the greatest frequency, whereas for grouped data the mode is taken 
as the midpoint of the interval which has the greatest frequency. For a 
smoothed frequency curve, the mode is the X value at which the curve 
reaches its maximum height. The mode is one indicator of central value, 
but as a descriptive statistic it has serious limitations. If a different size 
interval is used, the mode may be decidedly different. Furthermore, it 
occasionally happens that two nonadjacent intervals have the same maxi¬ 
mal frequency, thereby yielding two modal values. Such a distribution is 
said to be bimodal, but it should be noted that the bimodality may not be 
real but merely accidental, the resultant of the particular grouping interval 
chosen. In dealing with certain discrete series, like size of family, the 
modal value is apt to be more typical than some other measure of central 
value and therefore should be used, even though as a measure it is subject 
to greater sampling fluctuations than either the mean or the median. 
(The question of sampling cannot be discussed at this time; the student is 
asked to take on faith statements regarding the efficiency of a given 
statistic.) 

The median. As a measure of central value, the median is defined in two 
ways: (1) if the individual scores are arranged in order with respect to 
some trait, the median is the value of the midmost individual if A is odd, or 
lies midway between the two middle individuals when N is even; (2) when 
a distribution has been made, the median is defined as the point on the 
scale such that the frequency above or below the point is 50 per cent of the 
total frequency. For grouped data, the median may be determined by the 
following steps: 
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1 . Find one-half of N. 

2, Count the frequencies in a cumulative manner from the bottom up to 
that interval, say the 5 th, the frequency of which if included would give 
more than, if not included less than, Nj2 cases. Obviously the median 
will fall somewhere in this interval unless exactly half the values fall below 
the lower limit of an interval, in which case this lower limit is the median. 
Let F„ equal the total frequency up to the 5 th interval, and let F s equal the 
frequency in the 5 th interval. 


Table 3.1. The calculation of the median 


Score 

/ 

310-319 

1 

300-309 

2 


290-299 

4 

Nl2 = 25 

280-289 

l 

5 th interval is 260-269 

270-279 

6 

F c =24 F s = 12 

260-269 

12 

i = 10 

250-259 

11 

LLs = 259.5 

240-249 

8 

25 

Mdn = 259.5 + 10 — 

230-239 

2 

220-229 

0 


210-219 

3 



50 


3 ( Nj2 — F C )IF S will be the proportional distance required in the 5 th 
interval to locate the median. 

4. Letting i equal the size of the interval and LLs the lower limit of the 
5 th interval, the median will be given by 

Mdn = LLs + i ~~~ —(^.1) 

F s 

This involves the defensible assumption that the scores for the cases, falling 
in the 5 th interval are distributed fairly evenly over the possible score values 
in the interval. 

The calculation of the median is illustrated in Table 3.1, in which is given 
the distribution of scores made by 50 college men on the Brown spool 
packer. The score is the number of spools packed in four 1-minute trials. 

The chief merits of the median are its ease of computation, its indepen¬ 
dence of extremes (it can be computed even if a known number of extremes 
have not been measured), and the fact that it is not affected by the size of 
extremes. This last point will be clearer after a discussion of the mean. 
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The mean. This arithmetic average will already be familiar to most 
readers. The mean is defined simply as the sum of all the scores or measures 
divided by their number or 


where X represents any score, the symbol S means “the sum of,” and IV is 
the total number of cases. When N is small, this definition form can be 
used to compute the mean, but when N is large, say 50, 100, or more, such 
a method is not economical of time. Ordinarily, when N is large, we make 
a frequency distribution from which it is possible to compute the mean 
and median and other statistical measures. Assuming that the midpoint 
of an interval is typical of all the individuals in the interval, we can obtain 
the mean by summing the products of the several midpoints times their 
respective frequencies and dividing this sum by N. The error introduced 
by the use of midpoints is nonsystematic, i.e., tends to be ironed out so far 
as the computed mean is concerned. 

The computation of the mean can be shortened further by use of an 
arbitrary origin and deviations therefrom. The reasonableness of such a 
procedure can be readily grasped by considering the problem of determin¬ 
ing the mean height of a group of men. We could measure each man’s 
height from the floor or as so much in excess of a stationary bar 5 feet from 
the floor. The sum of the excesses divided by N will be the mean excess, 
and obviously we must add 5 feet to this to obtain the mean height of the 
group. 

When we have a frequency distribution the arithmetic can be shortened 
still further by expressing the deviation from an arbitrary origin in terms of 
step intervals, that is, as the number of intervals that a given interval 
deviates from the arbitrary origin. The arbitrary origin is taken as the 
midpoint of any interval, and it is assumed that the midpoint of each 
interval may be taken as representing the scores in that interval. 

The procedure can be developed by simple algebra. Let AO be the 
arbitrary origin, i be the interval size, and d be the deviation in step intervals 
of the midpoint of any interval from AO. Then each score can be expressed 
as X — AO + id in which AO and i are constant and Ovaries. From the 
definition formula for the mean we have 

M = + id) _ Yt(AO) -f- 'Eid 

N N N 

Now E(AO) will equal N(AO) because summing a constant N times is the 
same as multiplying it by N. As an exercise, the student should demonstrate, 
by taking varying numbers each multiplied by a constant, that hid = iZd\ 
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a constant can be brought out from under the summation sign. Hence 

we have 


M 


N{AO) ,^ = A0 + i^ 
N N N 


Since «e "S » » 

but the sum for a particular interval is simply/times its V 


Score 


/ 


Table 3.2. Calculation of the mean 

d fd 


310-319 

300-309 

290-299 

280-289 

270-279 

260-269 

250-259 

240-249 

230-239 

220-229 

210-219 


1 

2 

4 

1 

6 

12 

11 

8 

2 

0 

3 

50 


10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 


10 

18 

32 

7 

36 

60 

44 

24 

4 

0 

0 

235 


E/d = 235 
'Lfd 


N 


= 47.00 


M = 214.5 4- 47.00 = 261.50 




as 


M = AO + i 


.S/d 


(3.3) 


N 


In our algebraic derivation of formula (3.3) the only restriction plac d 

JTA .. *. »' ~ :vr,o o 

If an !rb trary origin and deviations therefrom in terms of step interval 
if I ZTtlken AO near the center of the distribution we would be 

I^a^ 

a 1a P rSStictily, it might be pointed out that the use of the arbitrary 
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d-igm step-interval scheme is analogous to using coded scores Tf we 

mean of the coded scores. g by M-K + k times the 

The beginning student who is puzzled about which measure to use the 

llippiigii 

determined on JErTS 

100 yards in 9.6, 9.7 9 8 9 9 10 0 »Z i*n f S1X men run 

in c * x anc * 14.0 seconds, the mean valnp 

0.5 is not as typical as the median value of 9.85. In general the mean i 
no as typical as the median when there are extreme measures in on^rec 

will he j • ° S6r a o reement than the two medians This noint 

will be discussed in more detail in the chapter on sampling errors (2) It can 

L h * f i^~ y jr Tta “ ^ 

two groups combined will be give'n by 2 CaS6S * the m63n ° f the 

Arf _ N 1 M 1 + N,M, 

Wi + N 2 

The median cannot be handled in such a fashion. Furthermore the mean 

scores - Vaif alMh“ * ^ C ° nSt3nt 3nd M the mean of the original 

will be CM where^dividinE™ 0111 ^ 16 * 1 ^ 3 COnstant ’ C ’ the new mean 
mean. g y 3 C ° nSt3nt wiU Iead to M ! c as the new 
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MEASURES OF VARIATION 

The description of the extent of scatter (or cluster) about the central 

Sfer wmewhat^interpretetion^nd^isefulnes^On^niay d^wSer 

determined by the location of just two individual measures or scores, an 
™ nothing about ,h. general cluttering of the tcorea 

auartile deviation (Q), defined as (fi, - ft)/2, m which & < or * e 
quartile) is the point above which one-fourth of the cases fall a 2i ( 
first auartile) is the point with three-fourths of the cases above. & (o the 
median) lJalready been defined as the point °f^^uped 

cases fall The computation of the two quartiles Q 3 and Q x g P 
data is essentially the same as that of the median. For instance in deter¬ 
mining the third^quartile we count up to the interval m which * e P° mt 
falls which dividesthe number of cases into two parts: three-fourths below 
Ind on^foS above. The distance into this interval is found in exactly 
the same manner as in computing the median. Since the quartiles are not 
influenced by extremes, it is customary to use them along with he medium 
Bv definition 50 per cent of the cases fall between the first and third 
M in ».V*,riO distributions i. is no. the ».B 

indicated by the median plus and minus Q will include 50 pe • 
would seenibetter to report both the first and third quartiles instead of Q, 
these vllues along with the median make it possible to picture 
whether or not the clustering above the median is different from that below 

“Idle, Closely allied to the quartiles are the 

percentile is defined as a point below which ^ P* ° f * ^ first 
Thus the median is the 50th, the third quartile the 75th, and the nrst 
auartilethe 25th percentile. The 10th, 20th, • ■ • 90th percentiles are 
sometimes called dedles. The computation of the percentiles from grouped 
data is accomplished in the manner indicated for computing the quartiles. 
The location of the zeroth and 100th percentiles is always e ™ g ' ^ 
these two points are dependent upon the Iocat,0 T 
tie are greatly influenced by chance), they are difficult to interpret 
Common feme would suggest that the concept of these two percentiles be 

dr °Peme d ntiles may readily be associated with the cumulative frequency 
distribution, and with the ogive curve if cumulative percentage frequencies 




20 


PSYCHOLOGICAL STATISTICS 


(obtained by dividing the/s by N) are used along the ordinate when plotting 
the ogive. In fact, the ogive may be used as a graphic scheme for deter¬ 
mining score values corresponding to given percentiles. For instance if 
we wish to obtain the 25th percentile point, we find 25 on the ordinate 
scale, proceed horizontally to the ogive curve, then vertically to the x axis 

of Fi^ a in , T 6 co J rres P° ndin S t0 th e 25th percentile. Scrutiny 
F.g. 2.4 will help the student understand the process. Could we also use 
the ogive as a basis for determining the percentile value of a given score? 

• h< L US< L 0 [ the , dlfference between percentiles as an indication of disper- 
sion should be obvious. In fact, the 10th-90th percentile range is a some- 
what better (more stable from sample to sample) measure of dispersion 
an the quartile deviation. Percentiles, howevei, are chiefly of value in 
reporting the scores of individuals on psychological and educational tests 
Oi dmarily a raw score gives no inkling of what it means, whereas when it is 
said that an individual scores at or near the 85th percentile, the implication 
that 15 per cent of his fellows score higher or better than he. Thus a 
percentile score carries with it some idea of the location of the individual 
with reference to the group. Furthermore, percentile scores for entirely 
different tests are comparable if derived from the same group or sample 
The original raw scores might be different units, e.g„ number of additions 

compar a U We and ^ l ° ^ ^ ° f pr0Se> and consequently not at all 

The average deviation. Sometimes called the mean deviation or mean 
variation the average deviation (AD) is defined as the average of the devia- 
tions of the several scores from the mean, Thus, if x = X — M, then 
AD=,^\x.\IN, where |z| is the absolute value of i.e., the negative 
evictions are treated as though positive. Currently the average deviation 
is seldom used; the student, however, needs to know something about it if 
he reads the earlier research literature in psychology. 

Contrasted with the quartile deviation, the average deviation gives 
weight to extremes, and for the usual bell-shaped distribution the limits M 
plus and minus AD will include about 57.5 per cent of the cases- the 

average deviation is larger than Q but not so large as the standard deviation 
to which we now turn. 

S(andard , devia , tion - A third measure of variation, the standard 
deviation , S, is defined as 

S' = VT^jN ( 34 ) 

where £ =*x-M. To compute the standard deviation directly from this 
formula would be very cumbersome and uneconomical, since * will usually 
involve decimals. A computational formula involving deviations from an 
arbitrary origin (AO) can be easily derived by algebra. Such a derivation is 
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included here in order further to familiarize the student with the method of 
handling summation signs. The derivation will be carried through for S 2 , 
technically known as the variance; then at the end we can take the square 
root to obtain S. 

From formula (3.4) we have 

A 

in which x = X — M. 

As in deriving formula (3.3), we can set 


X = AO + id 


and since M = AO + z(2d/A), we have, substituting in x = X - M, 


x = AO + id — 


AO + i- d - 
N 



= id — ic 


where for convenience we let c stand for 'Ed!N. 

x 2 = (id — ic) 2 = i 2 (d — c) 2 
Zx 2 = / 2 2(J - cf 

= i 2 (Ld 2 - 2cLd + Nc 2 ) 


Dividing both sides by 
S 2 = 


N, we have, 


N 


^ 2 _ 2c ^ + jv £! 
, N N N 




= ±- [NXd* - (Sd) 2 ] 

N 2 

hence 

S = - VNSrf 2 - (Srf) 2 

A 

But since this form does not make explicit the fact that each d, and each 
d 2 , must be summed as often as it occurs, we will insert/for the frequency 
of occurrence. Thus our computational formula becomes 

S ==- vAS/d 2 - (Z fdf *' (3.5) 

A 

where Tfd = the algebraic sum of deviations (in step intervals) from an 
arbitrary origin, and 2/d 2 = the sum of the squares of the deviations (in 
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step units). The arbitrary origin may be taken as the midpoint of the lowest 
interval or as a guessed average near the center of the distribution. The 
advantage of the latter procedure is that the ds will be relatively small and 
consequently will not lead to the handling of large numbers, whereas the 
first procedure avoids the use of negative numbers and is more readily 
adaptable to machine computation. 

The computation of S for grouped scores is illustrated in Table 3.3, which 
is identical to Table 3.2 except that we now have an fd 2 column. It is 


Table 3.3. Computation of S by use of an arbitrary origin 


Score 

/ 

d 

fd 

fd 2 


310-319 

1 

10 

10 

100 


300-309 

2 

9 

18 

162 


290-299 

4 

8 

32 

256 


280-289 

1 

7 

7 

49 

By formula (3.5): 

270-279 

6 

6 

36 

216 

260-269 

12 

5 

60 

300 

10 , 

250-259 

11 

4 

44 

176 

•S' = — V50(1339) - (235) 2 

240-249 

8 

3 

24 

72 

230-239 

2 

2 

4 

8 

= 21.66 

220-229 

0 

1 

0 

0 


210-219 

3 

0 

0 

0 



50 


235 

1339 



easily seen that the fd 2 values can be obtained by multiplying the fd values 
by the corresponding ds. If we regard d as a coded score (= X c ) with i as 
the constant k, we see that (3.5) is appropriate for computing S by way of 
coded scores. 

The fd and fd 2 columns need not appear on the work sheet when we are 
computing the mean and standard deviation by a Monroe or Marchant or 
Friden type calculating machine. The two required sums can be obtained 
by punching in the lowest d in the right-hand part of the keyboard and 
the corresponding d 2 just left of the center of the keyboard, multiplying 
both simultaneously by the given frequency, and then, without clearing 
the lower dial, punching in the next larger d and its square, and so on. The 
successive products so obtained will be accumulated by the machine so that 
'Lfd is read directly from the right-hand side of the lower dial, and Hfd 2 
is read from near the center of the same dial. If either an 8- or 10-bank 
machine is used, the ds of 9 and less are punched in the right-hand column 
of the keyboard, and higher values will of course require the first two 
columns. The squares of the ds will ordinarily be less than 400, rarely 
greater than 961, so that their values can be punched in columns 6, 7, and 8. 
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The student should note that the squares of 1, 2, and 3 are to be punched in 
column 6, the squares of 4 to 9 in columns 6 and 7, and the squares of 10 
to 31 in columns 6, 7, and 8. The sum of the squares will appear in the 
lower dial from window 6 to the left. With a little practice the two 
required sums for a distribution of 15 intervals and 200 cases can be 
obtained in less than a minute. It should not be necessary to say that the 
computation should be done twice as a check,. 

For use with a calculator, formula (3.5) has an advantage over formulas 
which involve two divisions under the radical. Thus we place the sum of 
the squares in the right-hand side of the keyboard, multiply by N, and 
leaving the product in the lower dial, punch the sum of the ds in the key¬ 
board and subtract it %fd times, and then from the dial copy the value of 
TVX/c/ 2 - (Lfdf. 

Briefly summarizing, it will be noted that (1) with a machine, XyT/ and 
Zfd 2 taken from an arbitrary origin at the bottom of the distribution are no 
more difficult to compute than when taken from a guessed average, (2) all 
sums are positive, and (3) the two sums necessary for determining both the 
mean and standard deviation can be obtained in the same operation. It is 
helpful to write the d column in fed on the work sheet, thereby throwing it 
into contrast with the / column. 

When N is small and the scores are not too large, S can be computed 
economically by way of the original (raw) scores. The definition formula, 
(3.4), calls for Xr 2 . Note that since each x - X — M, we have 

Sa* = X(X - Mf = XX 2 - 2MXX + XM 2 

Replacing the last X by N (we are summing M 2 N times) and replacing M 
by XX/JV, we have 

Sr 2 = 2X 2 - 2 — 2X + 

_ NXX 2 - 2(XX) 2 + (XX) 2 

i v. 

^2 = 1 nysx 2 _ (; zxf ] (3-6) 

N 

Substituting in formula (3.4) leads to an N 2 in the denominator, which can 

be brought out as 1 jN. Hence we have 

S = - VjvSX 2 - (SJC) 2 (3.7) 

N 

All the scores are simply squared and then summed to get XX 2 , and XX 

has the same meaning as in formula (3.2). 
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Although a mean computed by formula (3.3) from grouped data will 
not err systematically from the value obtained by formula (3.2), the use of 
ormula (3.5) for calculating .S' tends to give a value which is too large when 
compared with the nonapproximate value yielded either by (3.4) or by 
(3.7). The reason for this is easily explained at the blackboard—we give 
ere a hint. In general for an interval below the mean there will be more 
scores above than below the midpoint of the interval, whereas for an 
interval above the mean there will be more scores below than above the 
midpoint. Thus in taking the several midpoints as representing the scores 
within the several intervals, we are in effect using values which deviate too 
far from the mean. 

We may correct for the systematic error involved in using formula (3 5) 
by substituting in v } 

Aor = Vs^^jn) (3.8) 

The i 2 /12 is known as Sheppard’s correction for grouping. The uncorrected 
and corrected values differ but little when 12 or 15 intervals have been used 
and as the number of intervals is increased, the difference becomes smallei 
and smaller. If less than 10 intervals have been used, the error may be 
appreciable and the correction should be applied. These considerations 
form the bas is for the suggested rule that at least 10 or 12, and not more 
than 20, intervals be used. 

Regarding the interpretation of the standard deviation, it can be said 
that, when we have the usual symmetrical bell-shaped distribution, about 
68 per cent of the cases will fall between the limits plus and minus 15 from 
the mean about 95 per cent between plus and minus 25, and nearly all the 
cases (99.73 percent) between plus and minus 35. The standard deviation 
even more than the average deviation, gives weight to extremes and there- 
fore may not be as good as the quartiles for describing the dispersion. The 
standard deviation has decided advantages over other measures of dis- 

mT?!° n ' WjyP** 1 *’ k is more stab) e from the sampling point of view. 

( ) It can be handled algebraically, i.e., if we have two groups of M and N 9 
cases, wit M x and M 2 , and 5, and 5 2 , as the respective means and standard 
deviations, we can obtain the standard deviation for two groups combined 

S. = yj w + S° x ) + N 2 (M\ + S», j __ ~ 

'V Ni + m c (3.y) 

where the subscript c refers to the combined group. The mean for the 
com med group can be obtained by a formula given on p. 18. Formula 
(3.9) can be extended for determining the standard deviation for three or 
more groups combined. (3) The standard deviation is a mathematical term 
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which has considerable importance in more advanced statistical work. It is 
usually involved in the determination of sampling errors and is the measure 
of variation used in the analysis of variation and m connection with 
correlational analysis. Therefore, unless there are definite reasons for not 
using it, the standard deviation, instead of the average deviation or Q, 
should be used as a description of the amount of dispersion 

As an exercise, show that, if a constant is added to or subtractecfifrom 
each of a set of scores, the standard deviation does not change, and that 
multiplying or dividing each by a positive constant will lead to CS oi‘ S/C, 
respectively, as the new standard deviation, where S holds for the original 
and C is the constant. 


MEASURES OF SKEWNESS AND KURTOSIS 


If a distribution is not of the symmetrical bell-shaped type, it is not 
sufficient for descriptive purposes to report only the mean and standar 
deviation. We also need a measure of the lack of symmetry, i.e., ot 
skewness, and frequently it is desirable to describe the distribution still 
further by giving a measure which indicates whether the distnbution is 
relatively peaked or flat-topped, i.e., a measure of kurtosis. _. 

Skewness can be described roughly by a number of measures, such as the 
difference between the mean and median divided by the standard deviation, 
or in terms of quartiles or percentiles. If an adequate and stable description 
of skewness is desired and if a measure of kurtosis is also needed, a method 

based on moments is to be preferred. 

The first four moments about the mean are defined as follows. 


= s^ = jS2 

“ 2 N 
% “ N 


(3.10) 


ui 


If! 

N 


where * represents the deviation of each score from the mean of all the 
scores. For purposes of computation, we can determine the moments 
about an arbitrary origin, and then from these values we can obtain the 
moments about the mean. This procedure has already been employed in 
computing the standard deviation; i.e., we took deviations from an arbi¬ 
trary origin (The definition of the standard deviation was in terms ot 
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deviations from the mean.) If we use „ to represent moments about an 
arbitrary origin, the first four moments about AO can be defined as 
follows, where d is the score deviation from AO in step units: 

_ S/d 1 


(3.11) 


^ ^ be readi ' y defermined from 

Uy = 0 ^ 

«2 = i\v 2 - U 2 j) = 52 

«3 = i\v 3 - 3r 2 w 1 + 2v\) y ( 3 - 12 ) 

ll i = ,4 ( ,, 1 — ^v 3 v t + 6c 2 c 2 j — 3 if) 

The student should note the similarity of the formula in (3.12) for the 
second moment to that given for the standard deviation [formula (3 5)1 
A measure of skewness defined in terms of moments is ’ 

8i = \]@i = —%= (3.13) 

W 2\/ W 2 

de 0 Dart y uT et f riCa f distributions the vaIue of gl will be zero; hence the 
departure of gl from zero can be taken as a measure of skewness. The 

evia ion 0 gi from zero, however, must be considered in light of the 

TheTkewnes ^ fT" ° f be di “d later) 

is negSile P ° Sl1!ive when ft is P™tive and negative when gl 

The degree of kurtosis can be described by 


gi = (A 


(3.14) 


A 

When is less than zero, the distribution tends to be flat-topped (olatv 
kurtic) whereas forg 2 greater than zero it is relatively peaked with Tome' 
what h lgh tails (leptokurtic). When both and * a a^zero or near zeTo' 
the distribution is of the usual symmetrical bell-shaped type, which is 
referred to as the “normal” frequency distribution. 
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Formulas (3.13) and (3.14) also define ft and ft, which have been and are 
still used as measures of skewness and kurtosis. Recently, the g measures 
have come into use because of certain advantages that need not be dis¬ 
cussed here. 

It will be noted that the measure of skewness involves taking the third 
moment relative to S 3 (since w 2 = S% and that the measure of kurtosis 
depends on the fourth moment relative to S 4 . For a given distribution, all 
the values of u 2 , u 3 , and i/ 4 are in terms of the same measurement unit, say 
inches or pounds or IQs or minutes; hence the ratios in formulas (3.13) 
and (3.14) are pure numbers, i.e., are not inches or pounds or IQs or 
minutes. If we have the distribution of the weights and of the heights 
for 1000 individuals, the measure of skewness for the height distribution 
may be compared directly with that for the weight distribution. This is 
true by virtue of the fact that for each we are expressing the third moment 
relative to the amount of variability, both in inches for one distribution, 
both in pounds for the other. Likewise, it can be reasoned that the 
measures of kurtosis for different distributions are comparable, although 
the distributions involve different measurement units. 

In order to help the reader visualize the meaning of different values for 
gl as associated with different degrees of asymmetry, Fig. 3.1 has been 
prepared. 





Fig. 3.1. Polygons with different degrees of skewness. 
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When we have determined the mean and the second, third, and fourth 
moments, and from the moments have derived expressions which tell us the 
degree of dispersion, skewness, and kurtosis, we have a description adequate 
for most distributions. These measures can be used to determine the type 
of mathematical equation which will fit an observed frequency polygon- 
i.e„ we can write the equation of a frequency curve which fits the observed 
frequency distribution. A distribution frequently found in psychological 
research is of the “normal” type, which is sufficiently described by the 
mean and standard deviation. Ordinarily it is not necessary to compute 
unless the distribution “appears” to be skewed or to compute unless the 
lstribution seems peaked or flat. The nature of the research, the type of 
variable being studied, and also the size of the sample are factors which 
need to be considered in making a decision as to the necessity for com¬ 
puting measures of skewness and kurtosis. It is seldom advisable to 
compute these measures when N is less than 100. 

The student should be apprised of the fact that the rather frequent 
occurrence of symmetrical distributions for psychological variables may 
result from an artifact, and also that the occurrence of a skewed distribu¬ 
tion may likewise be artifactual. This is true because very few of the instru¬ 
ments used in psychological “measurement” involve equal unit scales—the 
measuring units are frequently arbitrary or even accidental. Many of the 
variables are measured simply in terms of the number of items checked or 
the number of items correct. The shape of the resulting distributions is 
largely determined by the percentage checking the items or by the difficulty 
of the items. If the items are of medium difficulty for a group, it can be 
expected that the scale will yield a symmetrical distribution when applied 
to the group; if the items are easy, the scores will pile up toward the top 
(give negative skewness); if difficult, a piling up toward the bottom will 
occur. In the absence of equal scale units for the measuring devices it 
cannot really be said whether the distribution of, for example, arithmetic 
ability for a given group is symmetrical or skewed—all that can be said is 
that in terms of the units used the distribution has a particular shape. 

From the foregoing it would seem that, since skewness (and kurtosis too) 
is partly a function of the accidental nature of the measuring units the 
descriptive measures of shape would have little value in psychology. The 
fact remains, however, that sometimes it is desirable to specify the skewness 
and kurtosis of a distribution of scores merely as a part of the description 
of what happens when a scale of measurement, however arbitrary the units, 
is applied to a given group. Furthermore, it is to the student’s advantage 
to know something of measures of skewness and kurtosis, because we shall 
ater have occasion to refer to them, and because he is apt to encounter 
them in more mathematical treatments of statistics. 




Chapter 4 

DISTRIBUTION CURVES 


By successive smoothing of a polygon (or distribution), we can iron out 
irregularities until the polygon becomes a “smooth” or regular and uniform 
curve. We can think of this curve as being similar or nearly identical to 
what we would obtain were we to increase indefinitely the size of our sample 
and at the same time use smaller and smaller grouping intervals. That is, the 
limit of a polygon, as we allow N to approach infinity and the interval size 
to approach zero, is conceived to be a curve which is smooth and regular. 
Now such a uniform curve can usually be described in terms of a mathe¬ 
matical equation. The student may recall that the general equation for a 
straight line is y = ax + b, and that y = 2x + 3 is the equation for a 
particular line, that x 2 -f y 2 = a 2 is the equation for a circle of radius a 
with the origin or intersection of the abscissa and ordinate at the center, 
also that y = a + bx + cx 2 is the general equation for a parabola. It is 
not until we give specific numerical values to the constants that we have 
equations for particular curves. 

Frequency curves can be thought of as representing the relationship 
between two variables: y, or the height of the curve, and x 9 the variate or 
variable under consideration. Frequency polygons or distributions, even 
when smoothed, may be of various shapes: symmetrical or skewed, 
flat-topped or steep, humped near the center or at one end, bimodal or 
unimodal, J-shaped or U-shaped, falling off gradually or suddenly, etc. 
A complete description of a frequency distribution is obtained when we 
have succeeded in writing the equation of the curve which “fits” the distri¬ 
bution. The type of curve to be fitted is chosen on the basis of certain 
criteria derived from the moments and the interrelations among the 
moments. The late Professor Karl Pearson developed the mathematics of a 
system of frequency curves and classified distributions according to several 

29 
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types of curves, but a complete exposition of these types is beyond the 
scope of this text. 

Normal curve. A bell-shaped curve which is often approximated closely 
by frequency distributions and which is intimately involved in much of 
statistical inference is known as the normal curve . We need to know in 
detail the properties of this curve. 

At this point we need to digress briefly to discuss a problem of notation. 
The mean and standard deviation have been defined in terms of an observed 
batch of scores for N persons, presumably selected or drawn as a sample 
from some defined population of persons. The symbols, M and S , stand 
for the sample mean and standard deviation. It is convenient to have 
symbols for the corresponding population values (parameters). Let us 
let y (mu) stand for the population mean and a (sigma) symbolize the 
population standard deviation. Rarely will we have numerical values for 
y and o; M and S may be regarded as estimators of /u and o. 

The general equation for the normal distribution may be written as 


y = 


N __ c -(X-u) 2 /2v 2 

O^jlrr 


for a population of N scores or observations, or as 


(4.1) 


y 


N 


e -(X-M) 2 /2S 2 


(4.2) 


for a sample of TV scores having g x and g 2 values so near zero that one may 
regard the distribution as normal in form (within the limits of chance, or 
sampling, error—yet to be discussed). Equations (4.1) and (4.2) involve 
two well-known mathematical constants, i t (3.1416) and e (2.7183). In each 
equation, y is the height of the curve for any value of the variable X. In 
order to write the equation for a particular normal curve, i.e., one which 
corresponds to a particular distribution, we need TV, y or M, and a or S. 
This is the basis for saying that when we have the usual bell-shaped (normal) 
distribution, we need only the mean and standard deviation along with TV to 
describe it adequately. Referring again to equations (4.1) and (4.2), we 
note that the numerator part of the exponent could be written in terms of 
deviation units, i.e., with x instead of X — y in (4.1) or X — M in (4.2). 
The y for a positive deviation of, say, 10 will be exactly the same as that 
for a negative deviation of 10 for the simple reason that the deviation is 
squared. This indicates that the normal distribution is symmetrical about 
the mean, and therefore the mean and median coincide. When x = 0, i.e., 
an X falls at the mean, y has its maximal value, and therefore the mean and 
mode also coincide. For values of x other than zero, the height of the 
curve will be less than that at the mean. This is evident if it is noted that 
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the exponent of , is negative. As we go 

mean, the height of the curve become*1^“ sl ^aSn. If we take the 
dropping off is slow at first, then ra P* ’ , ■ { 5 a (, 0 ve the mean 

»,o (asymptotic). Theotetica.ly, 

the curve never reaches the base line the freque ncy for a 

For both the frequency po ygon an ordinate but for smoothed 

giveninterval;;SSSl^es such as that defined by equation (4.1), 
regard the 

already been given on p. J include 95.45 per cent; plus 

deviation. The limits plus an ira minU s ^ g 9 .9936 per cent 

•laTr^pL^oM^^ 

b, aPP'—^t^S" 0 ^ ro dSdboIion of « 

obsetvmions^in'psychology can ever'foikw the normal curve insofar 

__the distribution are concerned. 
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0f SCOreS int0 rehtive Aviates, or so-called 


or by a (4.3) 

J = x -M _X 

ftaction^ S/or ste^d" ^ ^ in terms of 

the scores expressed as Xs. Such a d deviatlon of the distribution of 
based on a sample, hence is accomplished"! transformation is ordinarily 
values are used because the paramet 'si^T 8 ^ that is, sample 
The standard scores obtained by (4 4) or bv°!n/ 4 'i? ^ Unknowns - 
have a mean of zero and a standard a ’■ ° by (4 ' 3) when Possible, will 
shown. Since the mean of as can be easily 

number, we have Y f es 18 their su ™ divided by their 

*-•1 •*-'/S ) 1 Zx 

x ~ S~N~ 


M z = 


2z 


N 


NowSx = Z(X - M) = Sx _ y M _ v 

we have NM = 2T, hence 2* = 2 f __y Y “ ^> b “tfrom M = Zx/N 

distribution shape). Therefore M - n a r °’ 3 ways (“respective of 
Since A/ — n i , ’ IVI * ~ always. 

values. IftWd^ttans°L! / M U a4d d tr ti0n d fr0,n ^ mean ° f 3,1 the 2 
their variance, the square roo^of which ’ d ‘ Wded byjVwe have 
Thus, H r00t 01 whlch gives their standard deviation. 


Si = 


-2(ay sy 1 Zx 2 i 2 

iV st 2 "7T = 


S 2 jV s 2 


The cliange^o standarcfscores'is a /Lea h* 1106 * ke standar d deviation is 1. 

lation of (4.4) leads to the (more retSlet™^ 3 mani P- 

That is recognizable) equation for a straight line. 


__ X — M l 


= -X 

s 


M 

S 


-“is ir ion between * and * ^» 

of course, alter the shape of the fre me3r transform ation will not 

- * -.«- p° inc 
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so as to make the standard deviation equal to unity The student J ho ' S 
skeptical about this transformation business should ^ “J ed . 
change of scale is commonplace. We change inches to feet, feet m ’ 
we change from the Fahrenheit to the centigrade temperature scale, etc 
The N in the equation for the normal curve may be regarded as the tot 

nr under a frequency polygon may be regarded as N.) It will be oi con 
stable convenience ^to ^regard the total area under a normal curve as 
unity. With this and the concept of standard scores m mind, we y 
rewrite (4.1) or (4.2) as 


y = 


V* 


V/2 


(4.5) 


as the equation for the unit normal curve (unit area, unit standard 
deviation; andmeanofO). Note that this is ageneral 
as a relative deviate may be either a standard score or ’ " ^ M 
the deviation of a value (not necessarily a score) taken relative 

appropriate standard deviation. 

Wican be determine f by 

methods of the calculus. The area under the curve between any two values, 
z y and z 2 , is obtained as the value of the integral 


Z 


y dz 


(4.6) 


Perhaps this expression will be more meaningful to the student who has 
not studied integral calculus if the given area is regarded as composed of a 
large number of strips, each having a tiny base dz and a heig o j. 
each such strip the area will be nearly y dz, and the integral sign in formula 
14 6) simply means the “sum of” the areas of these tiny strips. 

The student of the calculus will also note that the firstdeova-tive o 
,■ /i n nr (A 5) set equal to zero and solved will yield a 

, y , h „ the mean and mode coincide. If the second derivaii.eset 
T“ «,o and solved for * o, a, i, will be found that the pom., of 
Inflection of the curve are located where x is ±cr or z is ± i. 

Normal erne table. Because of the widespread use of the normal 
curve tables of proportionate frequencies and ordinates for various . z 
values are available. The student need not be able to integrate equati 
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Fig. 4.2 


by Mme , 5 t0 , he e £ ies in £ £“££ 

found from column 2 as 28814- th<- u r u™ the mean t0 + ‘ 8 is 

that above is 211 Rfi f th a beIow thls P oint is -78814, and 

ve is .21186, of the total area. Note that 78814 nlus Pn „ i 

Q = -8453/10 = .67455 
= 1.1829g = .79795 
5 = 1-48262 = 1.2533 AD 

It is also useful to know that for an IV of 50 the 5 will be about one-fifth the 
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range, that for an N of 200 the S' will be about one-sixth the range, and 
that for an N of 1000 the S will be about one-seventh the range. 

The tabled values for the normal curve are often used in connection with 
problems similar to the following. If a distribution of the heights of men 
is normal with a mean of 68.0 inches and a standard deviation of 2.5, what 
percentage of men are more than 6 feet tall? We find z as the difference 
between 72 and 68, divided by S, or 2 = 1.6; then from Table A we find 
the percentage of cases that fall above this z value to be 5.48. Suppose that 
the mean IQ of 10-year-old boys is 100 and the standard deviation 16. 
What percentage have IQs between 90 and 110? What percentage of 
10-year-old boys would be classified as “gifted” (IQ above 140)? 

In practice, the answers to the foregoing questions would be approximate 
because M and S would be used in lieu of the population values, and 
because obtained distributions will not be exactly normal in form. 

The student will have noted that the answers to problems similar to the 
foregoing are possible by virtue of the fact that the areas and ordinates of 
Table A are for the standard score form of the normal curve with total 
area set equal to unity. By formula (4.4), we can pass from raw scores to 
standard scores and vice versa, and knowing N, we can readily convert 
proportionate areas to frequencies or frequencies to proportions. Thus 
the table can be used with any normal distribution regardless of the original 
measurement units. 

Standard scores. Perhaps it should be pointed out at this place that 
transforming scores, when distributions are normal or approximately so, 
to standard scores leads to new sets of scores which are comparable. For 
example, inches and pounds are not comparable units. If a man is 71 
inches in height and weighs 170 pounds, it is impossible to say whether he 
is taller than he is heavy, but when the 71 inches is transformed to a 2 of .9 
and the 170 pounds to a 2 of 1.3, we are able to say that, relative to his 
position in the two distributions, he is heavier than he is tall. Likewise, 
the raw scores on two psychological tests will seldom be comparable; 
changing to standard scores permits comparison, so that it can be decided 
whether a boy’s performance on one test is better or worse than his per¬ 
formance on another. This assumes, of course, a close approximation to 
normality, and that the means and standard deviations used in the trans¬ 
formations are based on the same or highly similar groups. 

Standard scores, as defined by formula (4.4), will involve both positive 
and negative values and decimal scores. Since these are awkward to use, 
a further transformation is frequently made in such a way as to yield a 
distribution with a preassigned mean and standard deviation, instead of 
the 0 and 1 that hold for the standard scores defined by formula (4.4). If 
we wish a distribution with a mean of 50 and a standard deviation of 10, 
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we simply multiply each 2 by 10 and add 50. Multiplying each 2 by 20 
and adding 100 would yield a mean of 100 and a S of 20. Either of these 
transformations will get rid of negative values and permit a sufficient 
number of score values without the use of decimals. In general, if we wish 
to transform a set of scores having a mean, M, and a standard deviation, 
S , to new values to be called Zs, with mean equal to any value K and S 
equal to S', all we need to do is to apply the relationship 

Z = z(S') + K, or Z = ( X ~ M )(S') + K 
which becomes 

z = f (x) ~ ? (s ' } + K (4 - 7) 

The last form is the easier to use in practice, particularly with a calcula¬ 
ting machine. Note that the last two terms will combine numerically and 
therefore can be placed in the lower dial as a positive or negative number; 
then the numerical value of S'/S can be set in the keyboard as a constant 
to be multiplied in turn upon the varying values of X. If the machine has 
a continuous upper dial, the best procedure is to multiply by the highest 
X first, and then, without clearing the dials, to subtract once for each 
successively lower value of X. Care is needed in aligning decimals, a check 
on which can be obtained by multiplying by the X nearest M. This should 
lead to a value, in the lower dial, that is near K. With this setup, we can 
readily run off a table that gives the values of Z for varying values of X. 

The comparability of two sets of standard scores, either as 2 s or as Zs 
with the same mean (K) and same S', does not hold for skewed distribu¬ 
tions unless the two distributions show the same degree and direction of 
skewness. This is unlikely to be the case in practice. There is a scheme 
for use with skewed distributions which not only leads to comparable 
units but which also normalizes the distributions, i.e., changes the distri¬ 
butions from skewed to normal. This procedure is known as T scaling , 
and the resulting scores are known as T scores. They are usually so calcu¬ 
lated as to yield a mean of 50 and a S' of 10, but other values for these 
constants are possible. The detailed procedure may be found in McCall’s 
Measurement * which also includes a table for expediting the transforma¬ 
tion. Suffice it to say here that T scaling basically involves determining 
the proportion (or percentage) of cases exceeding a given value plus half 
those on that value, and then entering such proportions in a table of the 
normal curve function to find the corresponding 2 values. Standard scores 
based on a normal distribution of original scores and T scores based 

* McCall, W. A., Measurement, New York: Macmillan, 1939, pp. 505-508. 
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on any shape distribution are comparable, provided they have bee 
determined so as to yield the same mean and standard deviation. 
They differ only in the way in which they are computed the standard 
score being a linear transformation which leaves the shape of the distribu¬ 
tion unchanged, whereas T scaling changes the distribution to the norma 
form If we begin with an exactly normal distribution and convert the 
scores to both ,s and Ts, there will be a linear correspondence between 
the two sets of transformed scores. If their means and sigmas are set 
equal, the Zs and Ts will be equal to each other. 

It will be recalled that the use of percentiles is another way of expressing 
scores on different tests so as to have comparability. The student should 
give sufficient thought to percentiles and standard scores to see how they 
are interrelated when the original scores are normal m distribution, flint 
The tabled functions (Table A) of the normal curve may help. The 
might also demonstrate to his own satisfaction that the difference between 
the § 50th and 60th percentile points is not apt to be equal to the difference 

between the 80th and 90th percentile points. 

Kinds of distributions. In anticipation of topics to be discussed, it 
might be well to mention some possible ways of regarding frequency 
distributions. We can have an observed, or sample, d i lstnb ^° n , of t ^° r ® S f 
for a group of N individuals; we can imagine a population distribute 
scores for either a finite or for an infinite N; and we can conceive of a 
distribution curve defined by a mathematical equation (or function). 
Because of chance factors (as yet undefined herein) we do not expect an 
observed sample distribution to be exactly like the distribution of the 
population from which the sample is drawn or like a defined mathematical 

dl Since we are seldom able to measure all members of a population, we 
can only assume that population scores follow some defined mathematica 
distribution. The form of mathematical curve assumed is usually decided 
upon by a consideration of the shape of an observed sample distribution 
As will be seen later, the reasonableness of the assumption can be checked 

St£ It S is 1C pos y sible, however, to show mathematically that under prescribed 
conditions given measures will follow a defined distribution curve exactly. 
We shall refer to such a distribution as theoretical or expected. Strictly 
speaking a mathematical distribution curve holds only for a continuous 
variable If we had the distribution for a discrete variable, such as number 
of children per family, we would never expect that increasing wou 
produce a curve-the variable takes on only point values 0, 1, 2, etc., 
hence we cannot allow the interval size (see p. 29) to approach zero, 
which is necessary for a smooth curve. 
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As i r p !!f d P revious, y> there are distribution curves which are not 

ThuTfa if J lntroduce ^her curves (or functions) when needed. 

us far, the normal curve has been discussed as a frequency curve, and the 
area interpretation has been in terms of the number of individuals or 

often 6 ’T eS V* 8 betWee " C6rtain SCOre Iimits ' This same c wve is 
often spolcen of as the normal probability curve, and as such it is regarded 

theoretical curve. We shall see, moreover, that there are theoretical 

curves other than the normal curve which may be regarded as probability 





Chapter 5 

PROBABILITY AND 
HYPOTHESIS TESTING 


Statistical inference and the testing of hypotheses involve the concept of 
chance, or probability. A simple example will serve to illustrate the 
probabilistic nature of hypothesis testing. Suppose a chap claims that he 
can distinguish between Camels and Lucky Strikes. To test his claim we 
could blindfold him and present him with either a Camel or a Lucky 
Strike (the brand to be presented is determined by tossing a coin). If on 
this one trial he correctly names the brand, we would not be inclined to 
accept his claim since he would have a 50-50 chance of being correct on a 
sheer guessing basis. So we give him a second trial (again, and for any 
subsequent trials, we toss a coin to determine which brand to present to 
him). If he were again successful we might give some credence to his claim 
but someone might ask whether making two correct discriminations could 
happen on the basis of chance. We shall presently see that the chances are 
1 in 4 of getting two correct, i.e., success on two trials could easily occur on 
the basis of chance. 

But suppose he is correct on three trials, then on the fourth trial, and 
also on the fifth; or perhaps he is correct on ten trials, or perhaps on 9 of 10 
trials ? Regardless of the number of trials and the number of successes we 
certainly should have some information about chance success, or the 
probability of correctly naming the brands on the basis of chance guessing, 
before we reach a decision regarding the claimed ability to distinguish 
between the two brands of cigarettes. This and similar decision problems 
involve notions of probability, to which we now turn. 

Probability. If we had a box containing 70 white and 30 black balls, 
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well mixed, and were to draw 1 ball at random, the chance of the drawn 
ball’s being black is said to be 30 out of 100, and the chance of its being 
white would be .70. This can be interpreted to mean that, if we made 1000 
random draws, each time replacing the drawn ball and remixing the con¬ 
tents of the box, the percentage of black balls drawn would be about 30, 
and of white draws about 70. If we roll a die, the probability of obtaining 
a 4 is J; i.e., a large number of rolls would yield a 4 about 4 of the time. 
If one tosses a symmetrical coin, it is usually said that there is a 50-50 
chance of its landing “heads up”, or the probability of a head is 4- This 
is another way of saying that in the long run the proportion of times that 
the coin lands as a head will be the same as the proportion of times it lands 
as a tail. 

These very simple examples illustrate a definition of probability : if an 
event can happen in A ways and fail in B ways, all possible ways being 
equally likely, the probability of its occurring is Aj(A + B) and of its 
failing is B/(A + B). That is, a probability figure is the ratio of the num¬ 
ber of favorable events to the total number of events, and it is therefore 
necessary that we be able to enumerate events in order to arrive at a prob¬ 
ability figure. 

If we draw a card from a pack, the probability of obtaining a spade is 4? 
and the probability of drawing a club is also J, but the probability of 
drawing either a spade or a club is 4 plus 4> or 2 * If we r °M a die, If 16 
probability of obtaining either a 4 or a 5 is 4 plus 4 . or These two 
situations illustrate the addition theorem of probability: the probability 
that either one event or another event will happen is the sum of the prob¬ 
abilities of their occurrences as single events. (The events must be mutually 
exclusive; i.e., if one occurs, the other cannot.) 

If we roll a pair of dice, the probability of a 2 on the first and a 5 on the 
second is 4 times 4, or ^ 6 - lf we toss 2 coins > the probability that the first 
will land a head and the second a head is 4 times 4 , or 4> which is, of course, 
the probability that both will land as heads. Notice that the result 
obtained with the second die or coin is independent of the outcome of the 
first die or coin. These two examples illustrate the multiplication theorem : 
the probability of two (or more) independent events’ occurring simul¬ 
taneously or in succession (one and the other) is the product of their 
separate probabilities. 

As just indicated, if we toss 2 coins, the probability that the first will 
land a head and also the second a head will be 4 times 4, or 4, which is the 
probability that both will fall as heads. The probability that the first will 
land a head and the second a tail will also be 4 times 4 , or 4- But 1 head and 
1 tail can be obtained in a manner mutually exclusive to the above; i.e., the 
first can land as a tail and the second as a head, and this combination or 
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event has a probability of i, whence the probability of obtaining 1 head 
and 1 tail will be i plus J, or *. This same result can be arnved at by 
listing all the possible combinations and taking the ratio of the number ot 
favorable to the total number of possible combinations. The possible com- 
binations are HH, HT, TH, TT, from which we see that 2 out of the 4 
possible events are favorable for the occurrence of 1 head and 1 tail. We 

also note that 1 out of 4 is favorable to 2 heads. _ , 

Suppose we were to toss 3 coins; we would have the following possible 
combinations: 

Coin 1 H H H H T T T T 
Coin 2 HHTTHHTT 
Coin 3 HTHTHTHT 

The total number of possible “events” is 8, 1 of which is favorable to 3 
heads 3 to 2 heads, 3 to 1 head, and 1 to no heads, thus giving the respec¬ 
tive probabilities of J, |, §, and £. If we were to toss 4 coins, we would 
have the following probabilities : 

4 heads re 1 head r 6 - 

3 heads A 0 head tV 

2 heads ~^q 

The student should satisfy himself that these are the correct figures by 
writing down all the combinations possible and counting those favorable 
to any particular number of heads. 


BINOMIAL DISTRIBUTION 

The process of determining possible combinations becomes quite 
laborious for, say, 10 coins, but the several probabilities can be obtained 
by the coefficients in the expansion of the binomial (a + b) n . Thus tor 
n = 2 (i.e., 2 coins) we have « 2 + lab + b\ or 1, 2,1; for w = 3, a 3 + lab 
+ 3 ab 2 + V, or 1, 3, 3, 1; for n = 4 the coefficients are 1, 4, 6, 4, 1. In 
each case the sum of the coefficients, T , will be the total possible combina¬ 
tions, and the coefficients taken as ratios with the common denominator, 
2 n will represent the probabilities for n, n - 1, n - 2, • • * , 0 heads. 

The student may recall that the general expansion of (a + b) n is 


a 

This 


n ci 


-i b , "(" - r > a «-2f,a + n(n ~ - ,3 a n ~ s b 3 + 


1x2 1x2x3 

expansion will contain (n + 1) terms and will terminate in i’*. For 
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1 ?n «’ m h f Ve ‘ he / ollowin g coefficients: 1, 10, 45, 120, 210, 252, 210 

i t-iV*' WhlCh SUm t0 I 024 > or 2 to tenth power. Thus the 
probabffity that all 10 coins will fall as heads is 1/1024 ; 9 heads, 10/1024 

etc. If we plot these values as a frequency polygon-these coefficients are 

fo^m heads" 9 h T" ^ they r6presentthe expeCted number of times 
for 10 heads, 9 heads, etc., out of a total of 1024 tosses-we will have a 

bell-shaped graph which will resemble somewhat the normal curve 

Another and more useful way, for our purpose, of considering the 

binomml expansion is to usep and q, in the place of a and b, with/, defined 

failure P o 0bab -T ° f SUCC “ S °" a sin § le element and q as the probability of 
failure, or q - 1 -/,. Thus we would have (p + q )» Suppose n = 2- 
the expression would be/, 2 + 2 pq + Wp = \ the ^ sj ’ 

his would give (J) 2 + 2(1X1) + (i) . ( or i |, and * as the probabilities for 

a pTob 8 abilitv a f f head n ^ ° head res P ectivel y- Each term is itself 
a probability fraction; the numerators are 1, 2, and 1 as before. For 

" a T , have ® 10 or ^024, 10 ® 9 (i) or 10/1024, 45®Xi) 2 or 
etc ° 24 ’ etC " aS tHe probabllltles for obtaining 10 heads, 9 heads, 8 heads, 

Th f ° h . ief advant age of using the p and q notation is that we can readily 
see what happens whenp is not equal to *. Consider the expectation when 
we roll a pair of dice with “success” defined as the rolling of “snake eyes.” 

^oblbffifv f Ve Kf +?) + 1)2 = * + 2 «W + as indicating the 

probability of obtaining 2 one-spots, 1 one-spot, and 0 one-spot. If 3 dice 
were rolled, we would have ^ + 3(rfA + 3 (.n.) + in nr _i_ 15 , 6 

S',"*- *”r*~ fo 'i 2 '■ *■"»«"r«. 

portant thing for the student to note is that these probabilities are definitely 

ThTllVT 3 Pr ° babllity distributi °ns are of the symmetrical type 7 
The student can as a tedious exercise, work out the probabilities for 4, 5 6 

l’™ 6 8 d l Ce a ai ? d therefrom learn that the sha P e of the distribution changes 
from marked skewness to less and less skewness as the number of dice is 

will "be in'the Ca " f ea ,r y Sh ° Wn * hat ’ if P = f and 9 = a the skewness 

be in the opposite direction. Another proposition which the student 

“ d l m ° n f t T r° iS that ’ f ° r a flxed "> the skewness ^creases as 

F,rL l k t farther . fr °™ 2 111 either direction—extremely small or extremely 
large ps (near unity) lead to very marked skewness 

The binomial expansion provides the probabilities of the theoretically 

rnn hi h freq " en , Cles for S iven « s > P s > and Such theoretical distributions 

The numeric^ ? ‘° Valu ®’ Variation ’ skewness ’ and Ptosis. 
The numerical values for these measures may be obtained by direct 

computation from the distributions built up by the binomial expansion or 

these measures may be obtained by simple formulas, which can be derived 

by simple algebra, without having the actual distributions available. The 
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H — np 
a = ^Jnpq 

y 1 = ~=J=. (skewness) 

\jnpq 

y 2 = 1 -(kurtosis) 

npq 

Since these formulas, which are for theoretical distributions, specify 
parameters (not values based on a sample), Greek instead of Latin letters 
are used as symbols. 

It should be noted that n is the number of elements, not the number of 
cases. The formula for skewness permits several deductions. When p = 
#also equals and hence the skewness is zero; the degree of skewness for a 
fixed n depends upon the deviation of p from J, i.e., the smaller or the larger 
the probability of success for each element, the more skewed the distribu¬ 
tion. Note also that, since n is in the denominator, the larger the number 
(n) of elements, the smaller the skewness for fixed values of p and q. 

The above formulas describe the theoretically expected distribution for 
given ns, ps, and qs. As will be seen later, any empirical distribution 
obtained by tossing 10 coins or rolling 3 dice will yield values which, for 
reasons to be discussed, will only approximate these values. 

It is of interest to consider plotting the binomial distribution as a 
histogram—the height of the successive bars will indicate the several 
expected frequencies, each of which is the numerator for a probability 
fraction. Now, if we work out the expected frequencies for number of 
heads when 20 coins are tossed, and if in drawing the histogram we scale 
the ordinate so as to have the over-all height about the same as that for the 
10-coin situation and also squeeze the base-line scale (ranging from 0 to 20) 
into about the same over-all distance as for 10 coins, the vertical bars will 
be narrower, and the resulting picture will look more like a normal 
histogram than that obtained for 10 coins. If we repeat the process with 
n larger and larger, each time scaling our axes to about the same size as 
used for 10 coins and for 20 coins, the several bars of the histograms will 
become narrower and narrower, and with n sufficiently large the bars will 
seem to merge and the contour of the graph will tend to appear indis¬ 
tinguishable from a normal curve. 

The normal curve is for a continuous variable on the x axis, whereas 
the binomial distribution involves a discrete variable, or point series. For 
example, it is impossible to have any values between, say, 22 and 23 heads. 
As n is taken larger and larger, and the total base line is kept fixed, the 
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obtained values or possible points become more and more closely spaced 
so that the point series approaches, or at least takes on the appearance of, 
continuity. As n approaches infinity, the binomial distribution approaches 
the normal distribution as a limit. 

Approximation of probabilities. The foregoing suggests the possibility 
of using the normal curve as a basis for approximating the probabilities 
obtainable by the binomial expansion. In order to see how this might be 
done we shall consider the binomial distribution for n — 16 for the coin 
tossing situation, as shown in Table 5.1. Suppose we wish to ascertain the 


Table 5.1. Binomial distribution for 16 coins 


Number of 
Heads 

Expected 

Frequencies 

Number of 
Heads 

Expected 

Frequencies 

16 

1 

7 

11,440 

15 

16 

6 

8,008 

14 

120 

5 

4,368 

13 

560 

4 

1,820 

12 

1,820 

3 

560 

11 

4,368 

2 

120 

10 

8,008 

1 

16 

9 

11,440 

0 

1 

8 

12,870 






65,536 


probability of getting at least 12 heads. This would be the sum of the 
separate probabilities of tossing 12, 13, 14, 15, and 16 heads. These 
probabilities would be the respective “expected frequencies” each divided 
by 65,536; hence the sum of the probabilities would be obtained by 
summing the numerators: 1, 16, 120, 560, and 1820, then dividing this 
sum, 2517, by 65,536. Thus the probability of securing at least 12 heads 
(12 or more) would be 2517/65,536, or a decimal equivalent of .03841 
(to 5 places). 

Now let us attempt to find the same probability by using the normal 
curve approximation. First we note that for the distribution in Table 5.1 
the mean will be np = 16(.5) and the a will be V npq = V 16(.5)(.5) = 2. 
It will help us understand the method of approximation if we superimpose 
on the histogram of the frequencies in Table 5.1 a normal curve having a 
mean of 8 and a a of 2 (see Fig. 5.1). If we regard the area of each bar as 
representing an expected frequency, we see that the sum of the areas for 
the bars based on 12, 13, 14, 15, and 16 heads divided by the total area of 
all the bars (= 65,536) will give the probability value of .03841 reported 
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previously. To approximate fl* * 

the area under the curve for Obvious, /we need the 

22 “e/r: 

when it'is^re/alleci'that/'e are Henc/we 

though it were a c°«“ vana, ^ ^ = \ $j2 = 1J5 . Tur „i„g to 

TaWe/i"wefind that the proportionate (J 1 th™ exact V protebility 

z ? L j f 5 °<b ^ 

value of 03841, the error i W from 12 to 11.5 

«• - rf - 5 “ " fmed “ ” 

Kt^tor 1 r««t proJibili., ofobmimns ,0 o, >1 or U 

than 16). for example, i h .21661. The normal curve 

approxim ation! calculated as the proportionate area under the curve from 

^^^SSSSSSF 
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JeTfcu^ tha? t r h °e babi, ?; We «■- “ - 

relative to a total area mav h^f b 6 " * W0 1 ' vaIues ^ken 

not inconsistent with 'our origirfaf"^!^ V & P r ° babdd y % ure - This is 
number (frequency) of events ? definltlon of probability involving 
events (toW t0 3 t0tal number °f 

under a frequency curve foa r ! indicated - the total area 

regarded as the total frequency and thTar T^ 16 ^ OT . function ) can be 
be regarded as the frequency wto which valuesTor fcores) fallTn £“*““ 
segment, ,t follows that the ratio of the segmental to the to a 7 81Ve " 

““ >»'«. - to ‘r™'. 

iXT 'S-T'r *' f - «”■' ..... »= 

from a normally distributed sunnlv of « & *’ drawn at ran dom 

1 . 96 , Similarly it can be sSd thf* Z ’ uu-r* numericall y ^ger than 
± 2.576 is very nearly^ whereas^ ±7^77 drawin S a 2 between 
these limits is .01. ’ P ro ablbt y for a z falling outside 

^t“‘"" b « P- 

probability distributions. 1 61 the USe ° f three non normal 


hypothesis testing 


aKiI , ‘ v ' lu,,,tua consideration of the blinuioiu 

ability to distinguish between two cigarette brands - 


By using the binomial 
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expansion we can readily specify the probability of being correct (by 
chance) n times out of n trials. Jhc answer is simply 1/2 TC ; if there were 
10 trials the probability of 10 correct choices (by chance guessing—no real 
discriminatory ability) would be 1/1024, or about .001; the probability 
of being correct 16 out of 16 trials would be 1/65,536, or about .000015. 

If our self-proclaimed expert did succeed in 10 of 10 trials we would, 
because of the small probability of 10 successes by chance, concede that he 
really possessed the ability to discriminate between the two brands. 

But suppose he was successful on 9 trials of a 10-trial series? We could 
readily specify the probability of 9 successes by chance (it would be 
10/1024) but for reasons which will become apparent later, it is better to 
ascertain the probability of as many as 9 successes in 10 trials (at least 9, 
or 9 or more, successes). This probability will be the probability of exactly 
9 successes plus the probability of exactly 10 successes, or 10/1024 + 
1/1024 = 11/1024 = about .01, which is sufficiently small that we might 
decide that his performance was based on ability rather than on chance. 
Note that such a record would occur by chance about 1 time in 100, so 
we couldn’t be sure that he really had the ability. 

Next, let us suppose that he was correct on 8 of the 10 trial?. The 
probability of at least 8 successes occurring on a chance basis would 
be 45/1024 + 10/1024 + 1/1024 = 56/1024 = about .05. Would we now 
conclude that he had the claimed ability? If we did so conclude we 
wouldn’t be as sure of our inference as when there were 9 successes, and 
far less sure than when there were 10 successes. In other words, the smaller 
the probability of attaining an obtained number of successes by chance 
the surer we would be of our conclusion. If he were successful on 7 trials 
(probability = P = .17 for 7 or more successes) we would no doubt 
hesitate before conceding that his performance was based on ability to 
discriminate, since 7 successes can too easily occur on the basis of chance 
alone. 

We are thus led to the question: What level of probability should be 
adopted as a criterion for deciding whether an observed performance is 
based on ability rather than chance? We are not yet ready to attempt an 
answer to this, but it might be remarked here that in choosing a level of 
probability it is necessary to consider the risk of being wrong in concluding 
that the fellow can discriminate vs. the risk of attributing his performance 
to chance when in reality he does have some ability. 

Whether a person can discriminate between two brands of cigarettes is a 
simple illustration of the problem of statistical inference, or the testing of 
hypotheses. For purpose of inference we set the hypothesis that our friend 
cannot discriminate between brands. This readily permits us to calculate 
the probability (P) of as many successes by chance as he attains on a series 
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° f tr ^’ ifP is sufflcientl y small we reject the hypothesis of no ability, and 
in so doing we are saying that his number of successes is statistically 
j/gm/cont t at is, nonchance. The level of significance associated with 
rejection of the hypothesis is represented by a probability-if we agree to 

Ifo we £ eSB °H ,y W !Ti, the P r0babiIit y of chance success is as low 
as OJ, we w 11 have adopted the P = .01 level of significance. If we are 

wilhng to be less sure and require P to be as low as .05 we will be working at 
e .05 level of significance. Whether we adopt the .01 or the 05 level is 
somewhat arbitrary-for this chapter let us quite arbitrarily choose? = 01 
as our working level of significance. After considering the more detailed 
discussion of this issue later in the chapter, the reader may prefer to adopt 
the 05 or some other level for judging significance. P 

The bmomial expansion (and normal curve approximation thereto) may 
be used in a wide variety of situations as a means of testing hypotheses A 

something analogous to success) for a single element (coin, die, trial, etc ) 
In other words, we need to specify^ (and q) so as to use (p + ? )» or we need 

to calculate the mean and <r in order to utilize the normal curve approxima¬ 
tion when n is not small. ‘Wioxnna 

*r, C ° nSi n er - the Pr ° b J lem ° f PUbHc ° pini0n P° llin g- tn polling studies we 
split 50-50 “ t6d m ether or not a Population of potential voters is 

50-50 snfit fir 1SSUe ; Accordln S 1 3' we set hypothesis that there is a 
0 50 split in the population. This hypothesis is to be accepted or rejected 

on the basus of information yielded by a sample of N persons who are 

eive^ i su re T nd (a8re6) ° r “ n °” (disa § ree ) t0 a statement of the 
g en issue. Suppose for sake of simplicity we take N = 64 and that 42 of 

50-50 S spH t a ? yeS reSP ° nSe - IS thlS result eousistent with the hypothesis of a 

To answer this we note that so far as the opinion poller is concerned there 

ves b fthk d 3 e t S1S ’ a 5 °- 50 ^ance that any individual in the sample will say 
yes (this despite the fact that the individual so far as he is concerned is not 
giving a chance response). Thus the probability of a yes response for a 

single individual is ./2; „ , 5 L f . ./(Z,’" 2“ i", 

Now our sample of 64 is analogous to a trial toss of 64 coins, so we consider 
the binomial_distnbution with n = N = 64. The mean = Np = 32, and 

the mZn = 4 ' T '!' e number of y es res Ponses, 42, deviates 10 from 
we uTed dl 5 ( ° Ur "° rma curve approximation would be slightly better if 
we used 41.5 — 32 = 9.5 as our deviate—correction for continuity) Thus 

ofoht 6 Z • ~ i° 4 i = 2 ' 5 °' Turnin S t0 Table A we fi nd that the probability 

006 bmin^ r ar§6 u deVlatlon in an u P ward direction from 32 is about 

probabili v o 6 f nhi° Ur hyP ° tbesis of 50 “ 50 s Pht we need also to include the 
probability of obtaining as large a deviation in the opposite direction; 
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, ki and have P = 012 as the probability of as large a 
hence we double .006 and tWs p is verv near 0 ur arbitrarily 

deviation irrespective of dir ® ct '° p== Q1 leyel for judging significance, we 
(and temporarily) agreed up • population being sampled, and 

reject the hypothesis of an equal split in * endorse the 
this rejection implies that a majority of the population woui 

given statement. number of respons es been, 

Sr'C‘.nTotV—" 0 of^i, 5~5, M. <*• « - ■»» 

as proportions multiplie y • yeses, and our result of 42 yeses 

in the P°PfT “ per cent yeses. Accordingly it would 

for a sample of 64 leads to . r deviation of 42 from 32 we are 

appear that in testing the °^ d ( ™ oport io„ units) or 65.6 

also testing the deviation of .656 trom .ou tm p p 
from 50 (in percentage units). 

Actually, what we did above was to take 


42 - 32 




V64(3p) 


= = 2.50 


value of xja. Thus we have 


42/64 - 32/64 _ 

•J 64(.5)(.5)/(64)^ -062 


2.52 


whieh differs from 150 only because of—0 «™ f “ 

dividing by N somehow presen-es f obsetve g sample 

might, ibemfore, dednee 

with - N,Kpin. or 

SnCby.™nlple hi.«d P .t »n page Mjhestand.rd deviation 

simply p, ana y P _ v ^ ( )/Ar Thls j ast term is 

EdSdevSn 

this as <v 
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distribution of successes (Tn a „ n co VW mean and for a chance 
Np and V~Npq asThe meanTnd 7 °l * " " " ^ ^ We h -e 

responses for TV individuals We L7 P ^^7**7 ° fllUmber of ^ 
chance distribution of proportion of vfses h ^ /7Vas the ffl ean and a of a 
uals. In the coin tossing aSSol^t^ ° n SampleS ° f * individ ' 
to a countable number of successes 8 pTh! 1 ^ 10 ” 8 ’ 6aCb t0SS 0r trial leads 
successes for successive tri^^ 0 ^ S v ° f ^ number of 

situation, each samtile of A/eoc i ,1 tbe binomial. For the polling 
aud the distribution of propo ^ fm t0 suc C c a e 1CU ' able Pr ° P ° rtion of W 
also tends to follow the binomill c l T ^ samples (of same size) 

<*»«, 4Ss.ro,“ refaid “ - “• 

due to chance (sampling) error’ Actuall’/thT Spec ^ ng * he variability 
proportions is a theoretical dktribnr ^ ^ sam P im £ distribution of 
proportion („, , ’ T' j “ *“P'= 

tion concerning the central ™i,,e * u,* y provldes us with informa- 
to be expected*we ^^77^ *7 T ° f ^ distributio " 
The scheme outlined previously for tesf nU u' ber ° f sam P Ie proportions, 
restricted to the cigarette blindfold te ln _S yP ot heses is not, of course, 
«» place thep fo,S„„““d^ ZZ T'“” g - *" 

&P of 1/3 (e.g., identifying 1 of 3^ ° Ur setu P m *ght involve 

50-50 split when polling fe g we mishTh ™ to the h yP°thesis of 

2 to 1 split). In the second n hZ ^ 7 lnterested 111 whether there is a 
successes or number of yeses Thefunda n0t bmit ourselves to number of 
able to categorize obreX'n^r 7, TT, re< l uirement is that we be 
oray) such as pass or fail avree in ‘ md uals) iuto two classes (a dichot- 
absent, cured o^Tot"urei, £ ” <bS8 * W ' ** OT *■% Present or 

“dure is to«p^ e TSi a pSSn! , a i8 PT °~ 

div e id P eXs r lTiaton C by d °” ^ baSiS ° f a StadsdcaI VpofheS E to 

This gives a z, sometimes called a critical ratio (ri>\ u■ u , 
small and p. not too extreme will ton /u (CR) ’ whlch for N not too 
which permits usto 7ceZl toe m “T normaI cu rve, the table of 

observed. Note that the proviso^tha^ 11 ^ ° f a ^ evlatlon as g re at as that 
the fact that the binomial distribution^ Ca | n " 0t be extreme follows from 
when/, is greater than .90 or less than lo/stiT ^ extrerae > sa y 

, 43). Since the Newness is 
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that we might adopt to prevent unjustifiable use of the normal curve 
approximation will be a function of N and p h . In general when both Np h 
and Na h exceed 5 we can safely use the normal curve; if either product is 
between 5 and 10 we should deduct .5/IV from the numerical value of the 
deviation of />„, from p h . This is another way of incorporating the correc¬ 
tion for continuity (p. 45). 

Formula(5.1) for has been written with/),, as a value specified by the 
hypothesis to be tested. As such the formula measures the chance 
variation in proportions when the hypothesis is true. Actually, saying, 
“if there is a 50-50 split in opinion,” is the same as saying “if the proportion 
of yeses is .50 in the population.” If we let p v0P stand for population 
proportion then the variation of sample proportions is given by substituting 
p (and q ) in (5.1). When we have an obtained proportion,/?^, and do 
not know p^ m (usually the case) and have no hypothesis m mind, we use 
p ob as an estimate of p vov , and 

s v = Vp ot qJN (5- 2 ) 

as an approximation of the standard error of an observed proportion. 

At this point the student may be somewhat confused by the use ot p, 
first as the probability of, say, success on a single element and then as a 
proportion. Note, however, that if we were told that .30 (a proportion) of 
I given group have brown eyes, we could say that the probability that a 
randomly selected person has brown eyes is .30. Furthermore, when we 
say that the probability of rolling a snake eye is 1/6 or .1667, we mean that 
the proportion of snake eyes for a large number of rolls will tend to be 

Some sampling theory. To facilitate later discussion we shall now 
introduce some notions of sampling theory. We will confine our attention 
to what is known as simple random sampling. The conditions for random 
sampling are that each individual (person, plant, animal, observation, etc.) 
in a defined population (universe, or supply) shall have an equal chance of 
being included in the sample, and that the drawing of one individual shall 
in no way affect the drawing of another (that is, the drawings must be 
independent of each other). The first condition is not easily met inpractice 
The aim is, of course, to obtain a sample which will be, within limits of 
random or chance errors, representative of the population from which it is 

When dealing with attributes, or the classification of individuals into 
two (or more) categories, for which the proportion in a given category is a 
useful descriptive measure, we can conceive of a population proportion, 
« , and a proportion, p oh9 obtained on a random sample of N cases. Now 

if we could draw successive samples of N, determine p oh for each sample, 
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and then make a distribution of the several p ob values, we would expect this 
distribution to follow the normal curve for N not small and p not 
extreme. This follows from our discussion of the binomial distribution 
and normal curve approximation thereto, the only difference being that 
we were then speaking of a chance distribution about some hypothetical 
proportion p h . If Ph happened to equal Pmp we would be dealing with 
precisely the same distribution of sample values. If, for example the 
ypothesis of a 50-50 split is true we would expect the distribution of 
succe ssive sam ple proportions to center at .5 and have a p = VJ^Jn 
- V (,5)(.5 )IN; if thepopulationproportion, /> TO , is .5 we would expect the 
succ essive samp le proportions to have a mean of .5 and <r = Vn~ a / N 
= V(.5)(.5 )jN. P 


DIFFERENCES BETWEEN PROPORTIONS 

The testing of hypotheses need not be confined to a single proportion. 

is is fortunate because m research involving attributes we are more apt 
to have two proportions, and since each is subject to chance (sampling) 
erroi, it follows that the difference between them will also be subject to 
chance error. To test a hypothesis regarding the difference between two 
proportions it will be necessary that we have information concerning the 
theoretical random (chance) sampling distribution of the differences 
between proportions. We will need to distinguish two different types of 
si ua ions. ( ) proportions based on two samples drawn independently 
rom two populations and (2) proportions for responses or observations 
obtained under two different conditions on just one sample. For either 
situation we set up a statistical hypothesis known as a null hypothesis. This 
hypothesis, which states that there is no difference between the population 
proporrions, will be rejected if the obtained difference reaches some pre¬ 
scribed level of significance but will be accepted otherwise. Stated 
differently, if the observed difference could readily arise on a chance basis 
we accept the null hypothesis; if the probability of its occurrence by chance 
is small we reject the null hypothesis. Note that our statistical hypothesis 
of no difference may be, and often is, diametrically opposed to the research 
hypothesis being checked by the data. That is, on the basis of theory or 
prior observations we may expect a difference, yet for statistical reasons 
we set the null hypothesis. If the obtained difference is statistically signifi¬ 
cant m the expected direction we regard the data as tending to support 
the research hypothesis. 

Nonindependent proportions. We shall consider first the situation in 
which the two proportions being compared are not based on independent 
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prouos but on iust one group (or on two related groups). Suppose we are 
interestedin whether a movie leads to a change of opinion, i.e., to an 
increase in the proportion favorable to some issue. We select a random 
sample from some defined population, get a yes (favorable) or no (unfavor- 
able^ respons^from each individual, show them the movie then again get 
a yes dr n 0 response from each. Our next step is that of tabulabon «td. 

Table 5.2 S Fod an individual who gave a yes response the first time and 

Table 5.2. Tabulation plan for handling proportions based on 
the same individuals 


Frequencies 

2nd 


No 


Yes 


Proportions 

2nd 

No Yes 


1st 


Yes 


No 


A 

B 

C 

D 


A + C BAD 


A + B 

C + D 
N 


Yes 


1st 


No 


a 

b 

c 

d 


?2 


P* 


Pi 

<h 

1.0 


ves response the second time, a tally would go in the upper right-hand cell ; 
for a ws at first followed by a no, a tally would go in the upper eft 

^ 1 T pt A B C and D represent the respective fre 

auadrant; and so on. Let A, c, a r T , A , 

auencies for yes-no, yes-yes, no-no, and no-yes responses. Then + 

the total number of yeses at first and B+D is the total number of yeses; the 

second time If each of these totals is divided by N, we will have the P 

jwrtions of ,e». ft ft. »P«ivdjr, f« th. »»< (or P-J ^ f 

second (or post-) set of responses. (Note: the right-handpart of'Table 5.2 
is obtained by dividing the 8 frequencies in theJ^statistical 

S ‘t - ft r - - 7”V, S” t 

that if the movie could be shown to the entire defined population the 
proportion of yeses before and after would be exactly the same. w 
no/mean that an individual cannot change, but it does mean that. t 
number of changes from yes to no balances off the number of changes fr 
To W yef Sufwe come to the proposition that on the basis of the null 





54 

PSYCHOLOGICAL STATISTICS 

Since this is precisely analogous to tossim? 4 _l n 
expect that when A 4- n & u n g A + D coins, we would 

S',f <? + fW“" f »- au «, 

yes to no Changes is complementary to th^numtef of'no to' "T 6 " ° f 

of the deviation of either D or A from (A + ° f , the SIgnificance 

significantly from A. ' + '° 2 t s us whether D differs 

expansion"to^evahrate th"e J^Tut for T+ 7, aCtUal ,, binomia > 

y v + £))(.5)(.5), which gives a critical ratio. 


O + 0)12 _ .5 D - .5.4 

JjtTd 


'A+ D 


a d^tio^"Ih^LlinTlf tht 41 5 8 Tl * ° f “ 

did on p. 49) but as we shall ^ •? • A + D (similar to what we 

sample size, N, into the picture DividT appro P nate t0 introduce the 

of (5.3) b, » chaage the 

z _ £ _ D/N - 4 IN 
ff V(4 + D)/N 2 

If we let a = A/N and rf = D/N, this may be written as 
o.-f- d-g 

a V(« + d)A ( 5 - 4 ) 
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This form for xja will make more sense if we again consider Table 5.2 
particularly the right-hand part. Note that since a + b - * and b + 

L * it follows that d - a = * - * and accordingly a test of the ' 
cance of D as a deviation from (A + D)I2 is also a test of the significance 
of the difference between the proportions of yeses obtained on the w 

^Tolncorporate the correction for continuity, deduct l/N from the 

“Lnominatorof the right-hand side of (5.4) must be a stoned 
deviation. Of what? Actually it is the standard deviation of the theoretical 
sampling distribution of differences between proportions, each difference 
being based on one sample of size N. Such a standard deviation as we have 
noted previously, is referred to as a standard error. us we ave 

a D = (5- 5 ) 

as the standard error of the difference between correlated proportion- 
The subscript r has been added to indicate that this formula holds f r 
related or correlated proportions. The relationship, or correlation, concept 
needs a brief word of explanation. If, by chance sampling, * were lower 
Stan the population value, we would expect* also to be somewhat low; 
if* were P by chance high, we would expect* to be somewhat high, if* 
were near the population value (near average), we would expect * to b 
near average. This varying together is referred to as a co-relationship 
correlation 8 Stated differently, we would not expect the two proportions to 
vary independently of each other for successive samples 

The proportions need not be based on the same individuals to be 
correlated. P For example, if we were interested in sex differences in opinion 
we might randomly choose families and then ascertain the proportion of 
yeses imong the husbands and also among the wives; for successive 
samplings the two proportions might be correlated because of a possible 
tendency for husbands and wives to agree on the given —. As a second 
example! consider the setup involving the pairing of individuals for the 
purpose of having comparable experimental and control groups. The fact 
of pairing signifies that the two groups have not been drawn independent y 
in the sampling sense; hence there might be a tendency for the proportions 
based on the two groups to be more or less alike. (About pairing we wi 

ISSt* (5.3), (5.4), and <5.5) are .ppItaO* 

is the problem of judging the significance of the difference between pr 
portions of yeses for two different questions asked o f the same sample o 
cases Since the responses to the two questions might tend to vary toget 
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there could be a correlation between the proportions on successive samp- 

In each of the foregoing situations we have pairs of responses, and our 
tabulation must follow the scheme set forth in Table 5.2; i.e. our tabula- 

tion will lead to the frequency of yes-no, yes-yes, no-no,’and no-yes 
responses. J 

Formulas (5.3), (5.4), and (5.5) are usable in other situations. When 
judging whether or not two test items differ significantly in difficulty we 
ordinarily have pass-fail data for both items on the same sample of TV cases, 
ur tabulation leads to the frequencies for pass-fail, pass-pass, fail-fail, and 
ad-pass. The kind of response is irrelevant—it need only be such that a 
dichotomy is involved for each item or question. 

These formulas may be safely used for any size sample provided^ + £>i s 

0 or more (and the correction for continuity is included when A + D is 10 
to 20). If A + D is less than 10, the binomial expansion provides an 
easily computed test of significance leading to an exact probability for as 
great a difference between the proportions as that observed The P so 
obtained needs to be doubled to get the probability for as great a difference 
irrespective of direction; otherwise it is the probability for as large a 
difference in one direction only. About this we shall have more to say 
ater under the heading, “One-tailed vs. two-tailed tests,” pp 61-63 
Independent proportions. It is not easy to build up a general formula 
for evaluating the difference between two proportions based on two 
independent samples. We can, however, learn something about formula 
construction and, incidentally, illustrate a general statistical theorem by 

considering a special case involving differences between independent 
proportions. r 

We have already seen how the binomial expansion, (p + q)» t c an be used 
as a basis for ascertaining theoretical, or expected, frequencies for various 
possible outcomes (events). Let us now see whether we can set up expected 
frequencies for the joint occurrence of events. Suppose persons J and K 
decide to while away some time at coin tossing. Each uses n = 5 coins for 
which the binomial yields expected frequencies of 1, 5, 10, 10, 5, \ for 
5, 4, 3, 2 1,0 heads, with mean = np = 2.5 and cr 2 = npq L But 
instead of making just 32 tosses, each makes 1024 tosses, for which the 

320 e 320 W0U ’ d * 32 tim6S ^ 5 ’ ! °’ 10 * 5 ’ ’> or 32 ’ 160 > 

J and K decide to make simultaneous tosses in order to learn something 
about joint outcomes, that is, to see how often both get 5 heads or how 

° “ J 8 ets 4 ^ eads whlle 3 heads, and so on. Now a little thought 

wi indicate that the total number of possible joint outcomes will be 6 
times 6, or 36. To keep a record of their results, J and K would be wise to 
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lay out a 6 by 6 table with 0 to 5 (heads) along the bottom and also along 
the left-hand side. When a particular combination occurs, say, 2 heads by 
J and 4 by AT, a tally mark is entered in the cell to the right of 2 and above 4 
(enter with /’s along the ordinate and K "s along the abscissa). 

Can we anticipate the frequencies in the 36 cells of the table ? This we 
cannot do, but we can specify the theoretically expected frequencies in 
either of two ways. The first method involves use of the multiplication 
theorem of probability. The probability of J obtaining 5 heads is 1/32; 
the probability of K obtaining 5 heads is also 1/32. The product of these 
two is 1/1024, which permits us to enter a 1 in the upper-right cell as the 


Table 5.3. Expected frequencies for joint outcomes when J and K 
each make 1024 simultaneous tosses of 5 coins 
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expected number of times (out of 1024 simultaneous tosses) that each gets 
5 heads. The probability of the joint outcome, J 2 heads and K 4 heads, is 
10/32 times 5/32, or 50/1024, which permits us to enter 50 as the expected 
frequency in the cell defined by 2 along the left and 4 along the bottom. 
Each of the other 34 cells can be similarly filled in by the multiplication 
theorem. The second method is simpler. For the 32 times we expect J to 
get 5 heads, we would expect fC s results to follow the binomial, hence we 
can immediately write down 1, 5, 10, 10, 5, 1 in the top row of the 6 by 6 
table. For the 160 times we expect J to obtain 4 heads we would again 
expect K’s outcomes to follow the binomial but, since 16.0 is five times 32, 
we would need to multiply the 1, 5, 10, 10, 5, 1 by 5, giving 5, 25, 50, 50, 25, 
5 as entries in the second row in the 6 by 6 table. By exactly the same line 
of reasoning the other rows can easily be filled in, with results as shown in 
Table 5.3. 

When a particular cell frequency in Table 5.3 is divided by 1024 we have 
a probability for a joint occurrence. Another way of interpreting a particu¬ 
lar cell frequency is to regard it as a mean value in the sense that if/ and K 
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performed a very, very large number of series of 1024 tosses we would 
expect the average of the obtained frequencies for that cell to correspond 
to the given theoretically expected frequency. That is, any expected 
frequency is to be regarded as the mean over an infinitely large number of 
trials. 

But we built up Table 5.3 for the ultimate purpose of saying something 
about the difference between independent proportions. Suppose J and K 
decide to make two additional tabulations for each pair of simultaneous 
tosses: the sum of their separate outcomes, that is, the number of heads 
for all 10 coins; and also the difference in number of heads, expressed 
arbitrarily as /’s count minus 1C s count. Thus for tabulating the sum of 
their results they would need “intervals” 10 H, 9H, • * • IH , OH, whereas for 
the difference they would need +5, +4 • • • 0 • • • —4, -5. Again, let us 
attempt to determine the expected results. 

It is easy to write down the expected frequencies for the various outcomes 
as sums—these would simply come from the binomial (p + q) 10 . We can, 
however, write them from Table 5.3. A sum of 10 (heads) can occur only 
when both J and K obtain 5 heads, for which the expectation is 1 out of 
1024. A sum of 9 can occur either when / gets 5 and K gets 4 or when J gets 

4 and K gets 5. Since the expectation for each of these is 5, the expectation 
for 9 as a sum becomes 10. A sum of 8 results from 5 and 3, 4 and 4, or 
3 and 5 for/ and K respectively, and these joint outcomes have expectations 
of 10, 25, and 10, which add to 45. Note now that diagonal adding, 
upper-left to lower-right in Table 5.3, will lead to 1, 10, 45, 120, 210, 252 
210, 120, 45, 10, 1 as expected frequencies for the possible outcomes when 
J and K sum the results for each of their simultaneous tosses. 

As to the difference in “scores,” when J gets 5 heads and K none we 
have a difference of +5 for which the expectation is 1 (out of 1024). A 
difference of +4 can arise when J gets 5 and K gets 1 or when / gets 4 and 
K gets none; summing the two expectations, 5 + 5 = 10 as the expected 
number of times for a difference of +4. A difference of +3 can occur in 
three ways with expectations of 10, 25, and 10, which add to 45 as the 
expected frequency for a difference of +3. Note that we are again sum¬ 
ming diagonally in Table 5.3, this time from lower-left to upper-right. 

The results both for sums and for differences, given in Table 5.4, are 
worth scrutinization. The two distributions are identical except for their 
location parameters, the mean being 5 for one and 0 for the other. Ob¬ 
viously, the variances are equal. The fact that the differences have a mean 
of 0 might have been anticipated, since every time J and K toss their 5 
coins, each is, in effect, making a trial—a trial which represents a sample. 
But each is sampling from the same universe, the universe of events when 

5 coins are tossed. (It is presumed that the coins are unbiased.) J and K’s 
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“universes” have the same mean (np = 2.5); hence in the long run it would 
be expected that chance will operate in such a way that the average of 
obtained differences will be zero. 

Chance will also operate to produce variability in the differences, the 
standard deviation of which can be specified. We have seen that the vari¬ 
ance of the difference is equal to the variance of the sum. The variance of 
the sum is nothing more than the variance of the distribution of heads when 
10 coins are tossed an infinite number of times, hence the variance of the 


Table 5.4. Expected frequencies (E/) for sums (£ ?( ) and differences 
(D h ) for 1024 simultaneous tosses of two sets of 5 coins, and 
differences in proportions (D ) 

Differences 


Sums For heads For proportions 
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difference is also simply npq = 10(.5)(.5). Note that 10(.5)(.5) — 5(.5)(.5) 
+ 5(.5)(.5). In general, when n t = n j + n k we can say that the variance of 
the sum will be the sum of the separate variances, i.e., n t pq = n^pq + n k pq. 
At this point, it should be obvious to the student that the variance of the 
sum of heads obtained on an infinite number of simultaneous tosses for any 
values of n j and w fc , not necessarily equal, will be given by summing the 
separate variances. It is not obvious that this also holds generally for the 
variance of the differences. Later we will have an algebraic proof, showing 
that the variance of a sum (or difference) is always equal to the sum of the 
separate variances when the events (scores) being summed are independent. 

In Table 5.4 we have a chance expected, or random, samplin g distribu - 
tion of differences in number of heads, with fx = 0 and a = VlO(.5)(.5). 
Suppose that J and K changed their “scoring” system from number of 
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heads per toss to the proportion of heads per toss by simply dividing the 
former by n = 5. Thus, they would have a scale running as .0, .2, .4, .6, .8 
1.0 along the ordinate and abscissa of Table 5.3. The differences between 
these proportions as scores would run +1.0, +.8, +.6, +.4, +.2, .0. —.2 
— .4, —.6, —.8, —1.0, as shown in the not yet discussed right-hand part of 
Table 5.4. Note that in changing the scale from number of heads per toss to 
proportion of heads per toss, both J and K divided the former by n = 5. 
Note further that the scale for differences in proportions (D v ) in Table 5.4 
can be obtained by dividing the D h scale values (center of the table) by 
n = 5. This change of scale leaves p = 0 unchanged; however, the stand¬ 
ard deviation is changed: a Dj> = \o Dh . More generally, if J and K each 
toss n coins (or roll n dice) an infinite number of times, the variance of the 
random sampling distribution of the differences, in proportion units, for 
their simultaneous tosses (or rolls) will be given by 


n n — 


— ± ^ 


' h = 1 (npq + npq) = H + B 


n 


n 


The foregoing rather lengthy development shows one way of arriving at 
a formula for the variance of the sampling distribution of the differences 
between independent proportions under the specified conditions, but these 
conditions (», = n k = », and known p) are seldom, if ever, encountered in 
research work. In practice we will have two proportions,^ and p 2 , based 
on and N 2 cases. Both^ and p 2 will be subject to sampling variation, 
hence their difference will also be influenced by sampling error. We will 
not know the two population proportions necessary for specifying exactly 
the standard errors for Pl and p 2 and for their difference. We must, 
therefore, resort to estimation. For this purpose we will assume the null 
hypothesis to be true; if true, the proportions for the populations will be 
the same. The best available estimate for this unknown common popula¬ 
tion proportion will be obtained by pooling the two samples, i.e., by 
taking p c , the proportion for the two samples combined, as the estimate. 
Then with q c = 1 — p ci we take the following as our estimate of the 
standard error of the difference between two independent proportions: 


~ J + ^ = VM c (Wi + 1/Ay (5.6) 

The value of p c is readily obtained by combining the two frequencies of yeses 
(or whatever the given category is) and dividing by N c = N x + N 2i and 
as usual q c = 1 - p c . An observed difference divided by S D will give a s 
interpretable as a unit normal curve deviate provided the Ns are not too 
small and p c is not too extreme. The rule-of-thumb is that p c or q c (which¬ 
ever is smaller) times or N 2 (whichever is smaller) shall exceed 5. When 
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this product is between 5 and 10, a correction for continuity should be 
incorporated. This may be done by reducing the numerical (absolute) 

. 1/1 1 \ 

value of the difference, p 1 — p 2 , by the quantity ~J • 


SOME GENERAL CONSIDERATIONS 

Before going further we should stop long enough to delineate the general 
problem of hypothesis testing, discuss the question of one-tailed vs. two- 
tailed tests, and consider the problem of what level of significance to adopt. 

Which hypothesis? Tn general, successive samplings will yield a 
sampling distribution of frequencies or of proportions or of differences 
between statistical measures or certain ratios (such as z or other ratios, to 
be discussed later). Hypotheses, whether statistical or research, are usually 
concerned either with differences or with deviations. By research hypoth¬ 
esis we mean the hypothesis set up on the basis of theory or prior 
observation or on logical grounds. Such a hypothesis usually involves a 
prediction regarding the outcome of an experiment. By statistical hypoth¬ 
esis we usually mean a null hypothesis set up for the purpose of evaluating 
the research hypothesis. 

When we are considering possible differences the null hypothesis, 
frequently symbolized as Z7 0 , is pitted against an alternate hypothesis, H v 
Now H 0 specifies that, for example, p vov{1) = /W 2 ) or that two population 
values do not differ, whereas H 1 might specify that p vop{1) > p m »K 2 ) or that 
/WL) </W2) or that /Wo ^/W 2 ). Which of these alternates is 
appropriate depends on the research hypothesis to be tested by experiment 
or what question is to be answered by experiment. An experiment is 
carried out which yields sample values, p x and /? 2 , and the difference 
between p 3 and p 2 is used to test H 0 against H x ; that is, on the basis 
of the obtained difference we are to make a decision as to whether H 0 or 
is true. 

If H 0 is true we can specify the probability of obtaining by chance a 
difference as great as p 1 - p 2 or as great as p 2 — p ± or as great as the numeri¬ 
cal (irrespective of sign) difference,/?! — p 2 . Let a represent a chosen level 
of significance—any level such as P = .10 or P = .05 or P = .01 or 
p = .001. We reject H 0 , the null hypothesis, if the probability of the 
obtained result is as small as the chosen oc, and this rejection implies the 
acceptance of H v If a is not reached we accept H 0 , but this acceptance 
merely says that H 0 could be true—any of a whole series of differences near 
zero could also be true. This acceptance-rejection business involves risks, 
to be discussed under “Choice of level of significance.” 

One-tailed vs. two-tailed tests. The three possible alternates listed 
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previously for H x have to do with hypotheses admissible on the basis of 
either the research hypothesis or the question for which we seek an answer 
by way of an experiment. In general, if H x states that p vovil) does not equal 
Pt>oi>{ 2 )> a two-tailed test is in order; if H x specifies which population value is 
the larger, a one-tailed test is used. The issue as to whether we should use 
a one-tailed test or a two-tailed test depends on whether the scientific 
hypothesis being tested (or at times the practical decision to be made) 
demands that we be concerned with chance deviations in just one direction 
or in both directions. For situations in which we wonder whether a 
performance is better than chance, as in blindfold cigarette discrimination, 
we are concerned only with results in one direction, since any performance 
in which the subject is successful on less than .50 of the trials leads us, 
without further statistical ado, to accept the hypothesis that he cannot 
discriminate better than chance. Thus a one-tailed test is appropriate. 
But for situations in which we wish to decide whether a population is split 
50-50 on some question, we need to consider chance sampling deviations in 
both directions; hence we should use a two-tailed test. 

Next consider the problem of testing the significance of the difference 
between two proportions. If, for example, we have the proportion of yeses 
to some question for a sample of Republicans and for a sample of Demo¬ 
crats as a basis for deciding whether Republicans and Democrats differ 
on the given issue, we would need to use a two-tailed test—we reject the 
hypothesis of no difference in case the obtained difference, irrespective of 
direction, has a probability of occurrence which is as small as a, the chosen 
level of significance. A one-tailed test would be utilized for judging 
significance in an experiment in which, for example, we were trying a new 
drug to see if it is better as a preventative than some commonly used drug. 
The decision to adopt the new drug is made only if the new drug leads to a 
greater proportion of immunities—results in only one direction are crucial 
to the decision to change drugs. But if we were trying out two drugs with 
the idea of adopting the one which is most promising we would use a two- 
tailed test since significance in either direction is the basis for decision. 

It is sometimes argued that whenever the outcome of an experiment is 
predicted on the basis of theory or previous observation, a one-tailed test is 
appropriate since some benefit should accrue to the researcher who has 
predicted the direction of the results as opposed to the investigator who, 
though obtaining similar results, has not predicted the outcome. The 
benefit comes about in that the z for, say, the P = .01 level of significance 
need reach only 2.33 for a one-tailed as compared with 2.58 for a two-tailed 
test. For the P = .05 level the respective values are 1.64 and 1.96. In 
other words a difference, to be significant, does not have to be as large for a 
one-tailed as for a two-tailed test. Since the situation involving prediction 
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to which two drugs have been administered, and our question is whether 
drug A is superior to drug B (a one-tailed test situation). Suppose further 
that the standard error of the difference between the two proportions is .02. 
The exposition will be somewhat simplified if we change to percentage units. 
This is readily accomplished by shifting decimals for the proportions and 
also for the standard error; the latter becomes 2 in percentage units. 

Figure 5.2 shows a series of sampling distribution curves, all with a = 2, 
but with locations differing according to supposed true, or population, 
differences of 0, 4, and 8. The top part (a) is for a = .10, the middle (b) 
for a = .05, and the bottom (c) for a = .01. In each part an ordinate has 
been erected at the difference required for significance at the given a level of 

Table 5.5. Correct and incorrect statistical conclusions 

True Situation 


No difference Real difference 

Real difference Type I error (a) Correct (jS) 

Conclusion 

No difference Correct (1 — a) Type II error (1 — fi) 


significance. These required differences spring from the fact that for a 
one-tailed test the z values that cutoff .10, .05, and .01 of a normal curve are 
1.28, 1.64, and 2.33 respectively, and since cr is 2, the respective required 
differences in percentages would be 2.56, 3.28, and 4.66. Sample differ¬ 
ences falling beyond these values would be in what are termed critical 
regions for rejecting the null hypothesis at the three respective a values. 
For example, values beyond 4.66 would be in the critical region when the 
P = .01 level of significance is adopted. 

From these several sampling distribution curves and with the help of a 
table of the normal curve functions, we can specify the probability of 
committing a type II error for a specified (supposed) true difference. 
If we keep in mind that the probability of a type I error is a (= .10, .05, or 
.01), and that we can make a type I error only when the true difference is 
zero, we see that the proportionate areas beyond 2.56, 3.28, and 4.66 for 
the three curves centering at zero represent the probabilities of making a 
type I error for the respective a values. For all sample values in the regions 
to the left of 2.56, 3.28, and 4.66 we would correctly accept the null 
hypothesis when in reality it is true. The probabilities for correct accept¬ 
ance are given by 1 — a, or .90, .95, and .99 respectively. 

Let us now consider the supposition that the true difference is 4. If 4 is 
the true difference, any obtained difference falling in the region to the right 
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(b) a = .05, z = 1.64. D must = 3.28. 



Fig. 5.2. Type I and type II errors. 

of 2.56, 3.28, and 4.66 will, for the respective levels of significance, lead to 
the correct decision that a true difference exists. The probabilities for 
these correct inferences are obtained by expressing 2.56, 3.28, and 4.66 as 
deviations from 4 (the supposed true value being considered), taking each 
deviation relative to the standard error of the difference (= 2), and thus 
obtaining z values of (2.56 — 4)/2 = —.72, (3.28 — 4)/2 = —.36, and 
(4.66 — 4)/2 = .33. Looking these values up in a table of the normal 
curve we get probabilities, for correctly rejecting the null hypothesis, of 
.76, .64, and .37, for the respective specified levels of significance, when 
the true difference is 4 percentage points. Probabilities for correctly 
rejecting the null hypothesis have been (and are usually) symbolized by f$. 
Note that all sample values falling in the region to the left of 2.56, 3.28, 
and 4.66 (for the curves centering at 4) will lead to the false acceptance of 
the null hypothesis. The probabilities of making type II errors will 
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correspond to the proportionate areas, for the curves centering at 4, to the 
left of these three points (when we have the one-tail test as considered 
here). These probabilities will, of course, be given to us by 1 - /i. Thus 
we have .24, .36, and .63 as the probabilities of making a type II error, 
when the true difference is 4 and for the .10, .05, and .01 levels of signifi¬ 
cance. Note that taking a smaller and smaller increases the probability of 
making a type II error. 

For a true difference of 8, we can by a similar line of reasoning obtain 
the probability of correctly rejecting the null hypothesis and the probability 
of falsely accepting the null hypothesis, when using any one of the specified 
values of a. These probabilities will involve the areas, under the curves 
centering at 8, to the right of 2.56, 3.28, and 4.66 (for the /5s) and to the left 
of these same points (for the type II errors). The student can readily verify 
that areas to the right of 2.56, 3.28, and 4.66 are approximately .997, .99, 
and .95 respectively. Subtracting each of these from unity will yield the 
probabilities, .003, .01, and .05, of falsely accepting the null hypothesis or 
committing a type II error when the true difference is 8 and for ocs of .10, 
.05, and .01. Again, the smaller we take a the larger the probability of 
making a type TI error. 

The probabilities given in the last two paragraphs, along with similar 
figures for other supposed true differences, have been assembled in Table 
5.6. A careful study of this table reveals the general rule that the smaller the 
value of a the smaller the probability (/5) of correctly rejecting the null 
hypothesis and the larger the probability (1 - /5) of committing a type II 
error. Thus when we reduce the probability of making a type I error by 
choosing a small, we do so at the risk of more often making a type II error. 
Note also that regardless of a, the probability of making a type II error 
decreases as the true differences deviate farther and farther from zero. 
This is another way of saying that the larger the true difference the 
more apt we are to detect it by experiment, and conversely the smaller the 
difference the less likely we are to discover it. 

Incidentally, the value of /5 for various possible true differences is 
referred to as the power of the statistical test for detecting the difference. If 
we plotted the /5s in, say, the a = .05 column of Table 5.6 against the scale 
of possible differences, we would have an ascending curve which would 
represent the power function of the test. It is beyond the scope of this book 
to consider in detail the concepts having to do with the power of a test. It 
should be remarked, however, that statistical tests differ in their power, 
and to understand this we would need to have more information regarding 
various tests that might be used to test a given research hypothesis. For 
instance, power depends on the choice of the critical region for rejecting 
the null hypothesis—for the first drug problem considered previously, a 
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one-tailed test is more powerful than a two-tailed test. In Chapter 6 we 
will be considering, among other things, differences between averages or 
central values, at which time it will be found that a test based on comparing 
means will be more powerful than one based on medians. 

Perhaps the discerning student will have noted that increasing sample 
size (or sizes) tends to reduce standard errors. In the foregoing discussion 
we supposed that we had Ns and proportions such that the standard error 


Table 5.6. Probability ((3) of correctly rejecting the null hypothesis and probability 
(1 — (3) of type II error associated with three levels of significance (as of .10, .05, .01) 
when certain true differences are supposed to exist 




P 



1 -/? 


a —> 

.10 

.05 

.01 

.10 

.05 

.01 


True 

difference 


1 

.22 

.13 

2 

.39 

.26 

3 

.59 

.44 

4 

.76 

.64 

5 

.89 

.79 

6 

.96 

.91 

7 

.99 

.97 

8 

.997 

.99 

9 

>.999 

.997 

10 

>.999 

>.999 


.03 

.78 

.87 

.09 

.61 

.74 

.20 

.41 

.56 

.37 

.24 

.36 

.57 

.11 

.21 

.75 

.04 

.09 

.88 

.01 

.03 

.95 

.003 

.01 

.975 

<.001 

.003 

.996 

<.001 

<.001 


.97 

.91 

.80 

.63 

.43 

.25 

.12 

.05 

.025 

.004 


of the difference ( a D ) was 2 percentage units. Quadrupling the Ns would 
reduce the a D to 1 percentage unit. How would this affect the results 
deduced from Fig. 5.2 and set forth in Table 5.6? Take, for example, 
a = .01 and suppose a true difference of 2 percentage points. With a D = l[ 
an obtained difference would have to fall in the region beyond 2.33 x 1 
= 2.33 to be judged significant at the .01 level. With a true difference of 2, 
the proportion of sample values falling beyond 2.33, calculated by taking 
(2.33 — 2)/l = .33 = z, is found to be .37. This is a value to be con¬ 
trasted with a p of .09 given in Table 5.6. We see, therefore, that quadru¬ 
pling the sample Ns has increased fourfold the probability of detecting a 
difference of 2 points. Or stated differently, the probability of a type II 
error has been reduced from .91 to .63. The moral is plain: one way of 
reducing the risk of making a type II error, without increasing the risk of a 
type I error, is to increase N or Ns. Whether this is feasible will usually 
depend on the resources available to the investigator. 
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theory and its use as a basis for hypothesis testing. This chapter will be 
restricted to the large sample situation, with requisite sample size specified 
at appropriate times. 


EMPIRICAL DEMONSTRATION 

The operation of chance sampling errors for means and standard 
deviations can be illustrated by tossing, say, 7 coins 50 times and tabulating 
the number of heads per toss. The obtained frequencies will usually vary 
somewhat from those expected, which would be proportional to 1, 7, 21, 
35, 35, 21, 7, 1 (as obtained by the binomial expansion). When the mean 
number of heads for 50 tosses is computed, it is not likely to be exactly 3.5 
(: np , the mean of the expected distribution), and the discrepancy from 3.5 
can be attributed to chance. Likewise, 100 tosses will show departures 
from the expected frequencies, and consequently the mean based on 100 
tosses will differ more or less from 3.5. Furthermore, and for the same 
reason, the standard deviation of the obtained distribution of heads will 
likely differ from 1.323 ( Vnpq , the cr of the expected frequencies). As an 
exercise the student can demonstrate the foregoing statements by actually 
tossing coins. Indeed it will be quite instructive if each class member 
tosses 7 coins 50 times, each time tallying the number of heads that turn up. 
This will lead to a frequency distribution running (possibly) from 0 to 7 
heads, with an N of 50. Then a second series of 50 tosses should be made, 
thus providing a second distribution. The two frequency distributions can 
be combined, so each student will have three distributions, two with Ns of 
50 and one with an N of 100. Note that chance is so operating as to pro¬ 
duce a distribution somewhat similar to the expected, but at the same time 
is operating in such a manner as to lead to discrepancies between observed 
and expected frequencies. 

Each student should compute the means and the standard deviations 
for each of the three distributions. Note how far these values depart from 
the expected mean of 3.5 and the expected standard deviation of 1,323. 
Then the several means and standard deviations secured by the class 
members should be brought together. In order better to understand what 
happens when each of several persons tosses 7 coins 50 times, i.e., takes a 
sample of 50 tosses, a frequency distribution of the Ms, also of the Ss , 
based on 50 tosses should be made. Likewise a separate distribution should 
be made for the Ms based on 100 tosses; also, the Ss. A study of these 
distributions should provide answers to such questions as: Their central 
tendencies are near what values ? What is the extent of dispersion for these 
distributions of Ms and 5s? Is there any difference in the dispersion for 
the distribution of means based on 50 tosses and that based on 100 tosses? 
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How would you account for this difference ? In general, what is the shape 
of these distributions of Ms and Ss ? 

Table 6.1 shows the distributions of the means obtained by several of 
the author’s classes. Though these are not models for number of intervals, 
they are nevertheless sufficient as a basis for answering the foregoing 
questions. Note that both distributions appear to be normal, that both 
center very near the mean of the theoretical distribution (3.5), and that the 


Table 6.1. Distribution of 600 means based on 50 tosses 
and 300 means based on 100 tosses of 7 coins 



50 Tosses 

100 Tosses 

4.00-4.09 

3 


3.90-3.99 

14 


3.80-3.89 

35 

4 

3.70-3.79 

50 

23 

3.60-3.69 

98 

58 

3.50-3.59 

119 

78 

3.40-3.49 

120 

85 

3.30-3.39 

85 

32 

3.20-3.29 

52 

17 

3.10-3.19 

21 

3 

3.00-3.09 

2 


2.90-2.99 

1 


Number of means 

600 

300 

Mean of means 

3.516 

3.513 

S* of distribution 



of means 

.190 

.135 

Expected S 

.187 

.132 

* Corrected for 

grouping. 


variability for means based on 

100 tosses is 

less than that based on 50 

tosses. It would thus seem that means based 

on 100 tosses are somewhat 

more stable or less variable than those based on 50 tosses. Does this 


suggest that a larger number of tosses, i.e., a larger sample, would tend to 
iron out the chance factors that operate to produce discrepancies between 
the observed distribution of number of heads and the expected distribution 
calculated by the binomial expansion ? Do you think that means based on 
500 tosses would show less dispersion than means based on 100 tosses? 

According to the mathematical statisticians, the standard deviation of 
the distribution of means is expected to be equal to 1.323 (expected a of 
the distribution of number of heads) divided by the square root of the 
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sample size. Note at the bottom of Table 6.1 that the Ys of the distributions 
of means, .190 and .135, are very near the expected values of .187 and .132 
obtained from 1.323/'V / 50 and 1.323 /a/IOO, respectively. 

Summarizing the results of the foregoing empirical work, we see that 
the means for successive samples tend to distribute themselves normally 
about the expected or universe mean, ju, with a spread or standard 
deviation which is very near the value predicted by mathematical theory. 
The student should keep these empirical distributions and deductions 
therefrom in mind as we now proceed to a more detailed consideration 
of what the mathematical statistician says will happen when successive 
samples of a given size are drawn from a defined universe or population or 
supply. 

MORE SAMPLING THEORY 

The discussion here holds for what is known as simple random sampling. 
As specified in Chapter 5, the conditions for simple random sampling are 
that the sample should be drawn in such a way that each individual 
(person, plant, animal, etc.) in the defined universe shall have an equal 
chance of being included in the sample, and that the drawing of one 
individual shall in no way affect the drawing of another. The aim is, of 
course, to obtain a sample which will, within limits of random or chance 
errors, be representative of the universe from which it was drawn. 

Let 

N = the number of cases, or size of sample. 

M = the mean of any sample (known, i.e., computed). 

S — the standard deviation of any sample (known, i.e., computed). 

p = the mean of the defined population (unknown). 

a — the standard deviation of the defined population (unknown). 

The p and a are for the distribution of scores or measurements for all 
the individuals in the defined universe. It is not assumed that this universe 
distribution is exactly normal; it may be skewed slightly. Strictly speaking, 
the number, N pop , of cases in the universe should be infinitely large, but 
failure to meet this requirement is not serious. As will be seen later, the 
adjustment necessary when a sample of N cases is drawn from a limited 
(finite) universe of N vop cases is of the order of N/N vov ; if it is knbwn 
that N pop is very large relative to A, the formulations about to be pre¬ 
sented will be sufficiently accurate for all practical purposes. 

Now suppose we draw a sample of N cases, compute the mean and 
standard deviation, then draw another sample of the same size and 
compute its mean and standard deviation, and so on until a large number 
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of samples, say 10,000, have been drawn. We will then have 10,000 
means and 10,000 standard deviations, each based on A cases. When we 
make a distribution of the 10,000 means and of the 10,000 standard 
deviations, we have random sampling distributions. From the point of 
view of mathematical rigor, the number of successive samples should be 
much larger than 10,000, certainly far larger than the 600, or 300, succes¬ 
sive samples of Table 6.1, in which we have only the beginning of two 
random sampling distributions. 

By rather complex mathematical methods it can be shown that, if 
successive samples of constant size, N, are drawn randomly from a normally 
distributed universe or population with mean equal to [x and standard 
deviation equal to a, the successive sample means will be normally dis¬ 
tributed about p, and the standard deviation of this sampling distribution 
will be o'/a/ N . The random sampling distribution of the successive stand¬ 
ard deviations will center at a (there is a small bias here which need not 
concern us at this time). For N large (100 or more) this distribution of Ss 
will be approximately normal with standard deviation equal to ajy/lN. 
These mathematical findings have often been checked empirically. Table 
6.1 provides a limited check on the sampling theory regarding the mean. 

We are now in position to consider a term used in Chapter 5. In general, 
the standard error of a statistical measure is the standard deviation of the 
sampling distribution for the given measure. The square of the standard 
error is called the sampling variance. For the practical statistician, the 
sampling distribution is hypothetical, and hence its standard deviation 
must be determined by a different formula from that used for computation 
from an actual distribution. The value given by o/VN is called the 
standard error of the mean and may be designated as a M . Each sample 
mean can be expressed in relative deviate form as (M — and these 

relative deviates will form a normal distribution with mean of zero and 
standard deviation of unity. By reference to Table A we can readily 
specify the chances of obtaining a sample mean yielding a deviation as 
great as that for a given M, provided the value of fx is known. But in 
practical work [x is the unknown about which we desire to make an infer¬ 
ence on the basis of just one sample. 

Before resolving this practical problem, we must call attention to the fact 
that the universe standard deviation, cr, needed to obtain a M is also an 
unknown. A single sample will yield a standard deviation, S, which, 
being a sample value, will of course deviate more or less from a. In 
order that an inference about p may be made from a single sample, 
a M is estimated by using SiVN; i.e., the unknown a is replaced by the 
sample S as an estimate. Instead of the true value for the standard error 
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. en bv c jIVn we have an approximate value, S/\ N. 
of the mean as given by / > am , r0 ximate standard error. 

Let S M , defined as S/VlV, stan » the P c P onse quent approximate value 
The ignorance concerning , reconsideration of the 

for the standard error of a gtven mean, lead^ As already 

pointed out, the means from success ^ F ^ ijkewise be distn . 
normally, and the relative deviate , ."when (as is nearly always 

buted normally since a M ma ke an inference about a 

the case) we have S mstea o ethi of t h e sampling behavior o 

universe mean, we need to relative deviates from ft where S M is 

successive sample means ® x P re , ple because the several samp e 

not a constant but varies from sample Qf the first sample 

standard deviations vary. for t b e second sample, 

mean will be (M x - ft) e y 1 ^ The dis t r ibution of these 
(M 2 - ft) divided by S ^ N ’ l normality unless N is fairly large, 
relative deviates will not a PPj" 2 ermining l M imposes the restriction 
Thus the use of an estimate ^ ^ ^ kss than 30; W e can safely use the 
that N shall not be too sma . an inferenC e or testing a hypothesis 

normal curve as the basis for dr f sion 0 f sampling is therefore not 
regarding ft. This chapte refinements necessary for JVs 

applicable unless IV is great® : than :K>. 
less than 30 will be given m Chapter 7 . 

hypotheses regarding a single measure ^ 
Whether the foregoing theory is‘used“ on the 

about a population value ^estigator. Weshallnowconsiderhypoth- 

practical problem faced by t ^ of inference which is useful 

esis testing, and later we s research hypothesis in mind, 

both when we do and do not h & hyp othesis about a population 

Single mean. The procedu 6 A , cases is ver y similar to 

mean on the basis of a saimple mean ( ^ pro p ort ion (discussed 

that for testing a hypothesis wh hypothesized value of ft. 

earlier, pp. 48-50). Wetet ^ ^ expressed m the 

Our sample mean, M, take ^ the0 ry tells us that if M* is true 

form of a z, that is, as (M - , M m be distributed normally 
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about M h with standar raising the question as to whether it is 
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error of the mean difference: S Md = SjVN. In other words, a mean 
change is treated just like any other mean. Regardless of any hunch or 
prediction about the effect of the experience (or the effect of the change in 
conditions), the null hypothesis is set that there is no effect. This is equi¬ 
valent to saying that, if we had X 1 and X 2 scores on the defined population, 
the value of fi n would be zero. If this hypothesis is true and if we were to 
take successive samples of size N, we would expect that the sample means 
would be distributed normally about zero with S — S Mj) . To test the null 
hypothesis we simply take our obtained M n as a deviation from the null 
value of zero and divide by S Mf) . That is, (M n — 0)/S Mj) = M d /S Md . 
This as a z is then used as an entry into Table A in order to specify the 
probability of as large a mean difference as our sample M D arising solely 
on the basis of chance sampling. Whether we reject or accept the hypoth¬ 
esis of no effect depends on whether P does or does not reach the chosen 
level of significance. We could use a one-tailed test here if the research 
hypothesis predicted the direction of the change, but if we had no a priori 
hypothesis as to the direction of change we would need to use the two-tailed 
test. 

A word should be inserted about the required computations since there 
is some danger of confusion when we are confronted with the calculation 
of M and S for scores (changes) which are both positive and negative, and 
sometimes zero. The gross score formula for the mean (3.2) and that for 
the standard deviation (3.7) are applicable provided we take Si) (equi¬ 
valent to 2X) as the algebraic sum. The equivalent of 2X 2 , that is, Si) 2 , 
raises no problem since the squaring process automatically eliminates 
negative signs. There are two reasons why we should make a frequency 
distribution of the Ds. First, the theory assumes that the Ds approximate a 
normal distribution; if a distribution is made we have at least a rough check 
on this assumption (there are statistical methods for checking this assump¬ 
tion; see p. 79 and also p. 231). Second, if N is sizable, computation from 
a frequency distribution is more economical of time than use of the gross 
score formulas. In laying out the intervals, we must provide a place for 
tabulating zero Ds. This can conveniently be accomplished by the foil owing 
illustrative scheme which includes only the four intervals near zero: 
2-3, 0-1, -1-2, -3-4 (for i = 2); 3-5, 0-2, -1-3, -4-6 (for i = 3); 
4-7 ? 0-3, —1-4, —5-8 (for / = 4); etc. Note that the last given intervals 
in each set are for negative i)s. AO taken as the midpoint of the bottom 
interval will be a negative number, and must be treated as such, when 
entered into formula (3.3). 

Other single measures. The general theory of statistical inference 
may be extended to testing hypotheses concerning any descriptive measure, 
provided information is available (from the mathematical statistician) 
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concerning the characteristics of the random sampling distribution of the 
measure. When the sampling distribution is normal in form with known 
or estimable variability, we may proceed to test hypotheses by setting up a 
z, or x/S, or xja. For this purpose we need formulas for the standard 
errors of different measures. The formulas about to be presented are 
based on the assumption that the score distribution is normal or approxi- 
mately so. r 

As previously noted, for N greater than 30 we may safely use 


5 



as the standard error of the mean. For N greater than 


( 6 . 1 ) 

100 it is safe to take 


« _ 1.253S 

^ mdH yffi ( 6 * 2 ) 

as the standard error of the median. A comparison of the standard error 
of the mean with that of the median indicates that the mean fluctuates less 
than the median; i.e., the mean is a more stable measure of central value 
than the median. In order to reduce the standard error of the median to 
the same magnitude as that of the mean it is necessary to take 57 per cent 
more cases, i.e., increase N by 57 per cent. It follows from this that the use 
of the median for distributions which are reasonably normal in form is 
equivalent to throwing away a large proportion of the cases. 

The sampling errors involved in measures of dispersion are 


AD 


Sq — 


S J01S 
J2N JN 
.75 6(AD) 

1.166(8) 


= .7075 


M 


(6.3) 


From these error formulas it will be seen that, considering the error relative 
to the magnitude of the measures of dispersion, S is the most stable measure 
of variation. Provided A is 100 or more, the sampling distributions for 
these measures of dispersion are such that their standard errors can be 
utilized in exactly the same way as the standard error of the mean. 

The standard errors for measures of skewness and kurtosis, as defined 
on p. 26, are 



(6.5) 





inference: continuous variables 79 

These two formulas are based on the assumption that the sample has been 
drawn from a normally distributed population, and therefore they can be 
legitimately used in testing the assumption of normality. It will be recalled 
that, for normal distributions, both gl and g 2 are equal to zero but for a 
sample they may not be zero; however, sample values should not show 
a greater deviation from zero than can be reasonably attributed to chance. 
If a sample yields a g 1 value which is more than, say, 2.58 times its sampling 
error we would suspect that the sample was not drawn from a symmetri¬ 
cally’distributed supply. Likewise, if g, deviates more than 2.58 times its 
standard error, we would question whether it is reasonable to believe that 
the population or supply is distributed with normal kurtosis. A two-tailed 
test is appropriate here, and consequently choosing 2.58 is equivalent to 
adopting the .01 level of significance. 


HYPOTHESES ABOUT DIFFERENCES 

One of the foremost problems in practical statistics is the comparison 
of group trends. We may wonder whether one college group is superior to 
another, whether practice on a task improves performance, whether rats 
learn more rapidly when food or when water is the incentive, whether 
reaction time is faster to sound than to light, whether the sexes show a 
difference in variational tendency, whether one learning method is better 
than another, etc. In order to answer questions like the above, it is neces¬ 
sary to make observations on samples from two groups or on the same 
group under two different experimental conditions, and then to compute 
appropriate statistical measures for the variable on which we wish to make 

the comparison. c 

Thus typically, we have two samples of N x and N 2 cases or two sets of 
scores on just N cases under two different conditions, with means My and 
M 2 and standard deviations ^ and S 2 , where the subscripts refer to the two 
sets of scores. As we have learned, each mean is subject to sampling 
fluctuations; therefore the difference between the means will also be 
subject to sampling fluctuations. Even though ^ ^ there may be a 

difference between sample means because of chance sampling errors, o 
test an obtained difference for significance we will need a measure of the 
sampling error of differences, i.e., the standard error of the difference 
between two means. Knowing this standard error we can set up the 
null hypothesis that there is no difference between the two population 
means and then reject or accept this hypothesis according to whether t e 
obtained difference does or does not reach an appropriate level of sigm - 

cance. . , 

Here, as in the case of the difference between proportions, we must 
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distinguish between the situation where our two means are based on 
independent as opposed to nonindependent (correlated) scores 
Difference between correlated means. Let us again consider the method 

fmS 5 V° r f^ 8 the si S nificance of a mean change. As 
pi ed there the and X 2 scores could stand for performance for N 

in lviduals under two different conditions. A little simple algebra at this 
point will lead to some interesting results. As before, we let 

D = x 2 -X 1 

By i definition the mean of the distribution of these N difference scores 

M d = — = S ( X 2 ~ Xj) 

N N 

_ SX, 

N N 


hence 


M d = M 2 — M ± — D 


M 


by which we see that the mem of the difference is equal to the difference 

fo owTi T m \ Th i S Wil1 ’ ° f C ° UrSe ’ be true f " r sample. It 

follows therefore that when we test the significance of M D as a deviation 

7“ 7° 7 are 7° teSting the si S niflcance of Dm as a deviation from 
ze o. i„ other words, we are testing the significance of the difference 
between two means based on the same N cases. 

f 7 e 7 eStill8 7; We caIculated -SA thence S M Let us consider a bit 
urther the standard deviation of the distribution of differences, S D . We 
rst express the Os as deviations from their own mean, i.e. d — D — M 
Since D = X 2 - X 1 and M D = M 2 — M v we have 

d=(X 2 - XJ - (M z - Afj) 

which, when the parentheses are removed and the terms shifted, becomes 


or 


d = X 2 - W 2 _ x t + M x 
d = (X 2 - M 2 ) - (X 1 - M x ) 


Both these new parentheses terms define deviation units of the type 
x M so that d = x 2 - x t . The standard deviation squared, or 

fo“(3°I)! tlmf erenCe Can be eXFeSSed by substitutin g d for * m 

c2 Srf 2 
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If we replace d by its equivalent, we have 

I i (x 2 - x x ) 2 _ £* 2 2 22^ 

N N 
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N 


N 


The first two of the three terms on the right are obviously the variances for 
the second and first sets of scores. The last term, involving the sum of the 
cross products of and the x 1 with which it is paired, has to do with the 
degree of correlation between, or similarity of, the scores that belong to the 
same individual. The reader is asked to take on faith, without further 
explanation here, the fact that the last term becomes 2r 12 S 1 S 2 , in which r is a 
measure of correlation. Hence we can 


S 2 D = S 2 2 + S 2 X — 2r 12 S' 1 S' 2 


( 6 . 6 ) 


or 


S D — 'Vs 2 2 + S 2 X — 2 r i 2 S v S 2 

Since the standard error of any mean is given by dividing the standard 
deviation by the square root of N, w_e secure the standard error of the 

mean difference by dividing S D by V N, i.e., 

+ S 2 2 -2t^&S 2 


C — — 

— ITT ~~ 




JN 


S 2 1 . s 2 2 2r 12 S 1 S 2 




N 


N 


N 


The first two terms under the last radical are the sampling variances of the 
two means, and since 2 r^SffJN can be written as 

2r -M- 

12 VnVn 

we have finally that 


' Mi> = 7 I = + S * M2 ~ 2r 


12V mV m% 


Since each M D = D M , it follows that S Mb = S Djt , or that the standard 
error of the mean difference is equal to the standard error of the difference 
between the two means. Thus we have two ways for evaluating a difference 
between nonindependent means. We can compute M D , S D ; thence 

- Sj) 


'M D - /Tr 




(6.7) 


or we can compute M v M 2 , S v S 2 , and r 12 , and then obtain 

$M 


= Vs 


2 ill, + S 2 m 2 ~ ^iVmVmz Sf) 


( 6 . 8 ) 
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Formula (6.8) is usually referred to as the standard error of the difference 
between correlated means, hence the symbol S D 
But by working with the difference between paired scores, we can obtain 
the standard error of the mean difference (= difference between means) 
without computing r. Even after we have learned how to compute r, it 
matters not whether we compute the standard error of the difference 
between means of related scores by formula (6.8), or whether we compute 
its equivalent, the standard error of the mean of the differences. 

Strictly speaking, the r 12 in (6.8) should be written as r M ^ M so as to 
indicate that it is a measure of the extent to which successive pairs of 
means vary together, but it can be shown that the correlation between 
means is the same as r 12 , the correlation between the scores entering into 
the means. 

Since M D = D M and S Mj) = S Dm , it should be obvious that when 
testing the null hypothesis we have 

- = _ Dm __ 

S s Dm 

That is, the procedure for testing the null hypothesis that M D is zero for a 
population is equivalent to testing the null hypothesis that ^ = fi 2 where 
the subscripts 1 and 2 indicate that we are considering two populations of 
scores , one for each condition. 

Formulas (6.7) and (6.8) are appropriate in a number of situations in 
which an X Y score is somehow paired with an X 2 score. Some of the possi¬ 
bilities are the following: 

a. X x as first trial practice— X 2 as later trial; same person. 

b. X 1 as initial—experience— X 2 as final; same person. 

c. X x as pretest—experience— X 2 as posttest; same person. 

d. X x under experimental conditions vs. X 2 under normal (or control); 

same person. 

e. X 1 in one experimental condition vs. X 2 in another; same person. 

/. X x as experimental vs. X 2 as control; twin or litter pair. 

g. X 1 as experimental vs. X 2 as control; unrelated persons, but matched 
by pairing on pertinent variables. Ditto, for two experimental 
conditions. 

For situation (g), which is commonly employed in experimental work, 
we can think of having drawn ^individuals at random for one group, then 
forming the second group by selecting individuals who can be paired with 
the members of the first group on the basis of variables which need to be 
controlled. Thus any found difference between M 1 and M 2 will not be 
attributable to differences between the two groups with respect to the 
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variables used in forming the pairs, since the pairing tends to make the 
groups equivalent on the pairing variables. This same pairing proce ure, 
Ld alsolwin or litter pairs, can be used for situation e. Furthermore, as 
we shall see below, the X x and X, scores can themselves stand , f ° r 
X the change from pretest to posttest under an experimental condition 
and X 2 the change under another experimental condition or under control 

^The'statistical advantages of having scores which are somehow related 
wi^be discussed later under the caption “Reduction of sampling error. 
Difference between independent means. When we have means 
two samples which have been drawn independently, there will be no way o 

nllinTscores except on a chance basis and chance pairing will tend to 
pairing scores excepr oi took a n possible pairs the 

produce a zero correlation, in tact, n we e r 

correlation would be exactly zero. Thus the correlation term in (6.8) 
vanishes, so that the standard error of the difference between means based 
on independent samples becomes 


= 7* 


‘ Mi 


+ sV, = ■/— + J 2 


Ni 


£ 2 

n 2 


(6.9) 


This formula is not restricted to samples of the same size; i.e., N, need not 
equal JV a . The right-hand form of (6.9) has an obvious computational 

obtainable by formula (6.9) may be used in exactly the same 
manner as'the standard error of the difference by formulas (6J) and 6^b 
A pain we set the null hypothesis that ^ H or that the ditteren 
tetw^n tie population means is zero. If it is zero, the sampling d.stri- 
hution of D resulting from successive replications will center at zero wi 
a a a ’ tinn — V If D I S n (or z) is sufficiently large, the null 
Vnof iUsmcepL/ in other words the genera, 
procedure for testing hypotheses about differences is _precisidy the same 
for means (and other statistical measures) as that outlined in Chapte . 
The student would do well to review the discussion dealing with 
hypotheses, one-tailed vs. two-tailed tests, choice of level of significance, 
and the two types of error one risks in testing hypotheses. 

of hypothesis testing is applicable for descriptive measures othe ^ * a " 
portions or means. The general pattern for the standard error of the differ¬ 
ence between any two statistical measures, say and C 2 , is 




= VS 2 Ci + S 2 Ca - 2r CiC S c S Ct 


That is we need to know the standard error for both Q and C 2 and a 
lelsur; If the correlation between Q and C 2 in case of nonindependence 
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(the r term drops out for independently drawn samples). This correlation 
which is a measure of the extent to which Q and C 2 vary together when 
successwe samples are drawn, is known to be r 12 when the Cs are means 

flnn tn hA nrUAiA 4-1__ i * 


------ nuwn tllW 

when the Cs are standard deviations, with 


t . ~~ — umi,uiuu vj'-vtauuni, wiui r 19 being the 

correlation between the scores entering into the means and standard 
deviations. Accordingly, the standard error of the difference between two 
s based on the same individuals or on scores related consanguineously or 
related by pairing on pertinent variables is given by 




( 6 . 10 ) 


s *>. = + s\ ■ 

and for S\ based on independent samples 

S Da = VS 2 Si + S 2 8z = .101 S Dm (6.11) 

These formulas are valid for large As (100 or more), and to test the null 

ChaptoH^ SmpIy t3ke DsISd * ^ a Unit n ° rmaI Z ' (For Ns small > see 

The difference between medians based on correlated scores cannot be 
tested because the needed r is unknown, but for independent samples we 


s D = Vs 2 „ 

J mdn man l 


+ S’ 2 , 


D-mdn r "" mdn l r mdn 2 

V a " d f ° r Can be similarl y wr i«en for the case of 
independent samples. 

Any student who is worried because formula (5.5) for the standard error 
of the difference between correlated proportions does not include an r term 
may rest assured that the correlation has been allowed for even though not 

Ten ^ S °i F ° f rm “ ,a (5 ' 5) 18 analogous to formula (6.7), which we have seen 
is equivalent to the longer formula (6.8) in which there is an r. 

REDUCTION OF SAMPLING ERRORS 

One of the aims of scientific method is to attain as great precision in 
results as is practicable. In statistical work this can be accomplished by 
increasing the accuracy or dependability of the scores or individual 
meas Ur e m e nts or responses and by decreasing the chance sampling errors 
of the various descriptive measures. One way to reduce sampling errors is 
to employ either the stratified or the area method of sampling, both 
of which are too complicated for us to discuss here. If the random 
sampling method is being used in projects which aim to study the difference 
e ween gr°up s (or populations), the obvious, and only, way for decreasing 
the standard error of the difference is to increase A for either or for both 
samples. Most field investigations are of this type. 
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In contrast, the experimentalist can define his population with reference 
to two laboratory or experimental situations, i.e., a population of indivi¬ 
duals under situation A and a population of individuals under situation . 
His sample individuals for the two situations may be the same indivi¬ 
duals first under the A and then under the B condition. In general, the use 
of the same individuals, if feasible in view of possible practice or fatigue 
effects, will usually involve a fairly high degree of correlation, the net effect 
of which is to reduce the standard error of the difference considerably; i.e., 
it is sometimes possible to reduce sampling error simply by using th ® si * m ® 
individuals as the “two” samples. Thus, if we wish to study the effect of 
two different degrees of humidity on mental output or efficiency, it will be a 
more economical and better controlled experiment if we make observations 
on the same individuals under the two conditions A and B, rather than on 
N, individuals under condition A and N 2 individuals under condition B 
If it is not feasible to use the same individuals m the two experimental 
situations, we can make up two groups by pairing or matching individuals 
on the basis of one or more characteristics. Such a procedure leads to 
more nearly comparable groups for our experiment than can be obtained 
by choosing individuals at random and, by usingeitherformula (6.7) or (6.8) 
instead of (6.9), we can make allowance for the fact that the individuals for 
the two samples have not been chosen independently. The use of individuals 
who have been paired is considered good experimental technique—it canno t 
be said that a found difference between means for the variable being studied 
may be due to a lack of comparability of the two groups with respect to the 
matching variables. The use of paired individuals has a statistical as well as 
experimental advantage in that the sampling error of the difference between 
means is thereby reduced without the necessity of increasing the number of 

cases. If pairing produces an r of .75, the reduction is equivalen o 

that achieved by quadrupling the number of cases when the random 
method of forming groups is employed. After the student has learned 
about correlation he will better appreciate the fact that the gain m pairing 
depends on the extent to which the variables used m pairing are correlated 
with the variable being studied. 

It is thus seen that, for some types of investigations, greater precision 
can be obtained by judicious planning. If we had unlimited resources we 
could always attain any desired degree of precision by simply taking 
sufficiently large samples. 

Frequently the question is raised as to how many cases should be secured 
for a given study. The answer might be in terms of the number needed to 
reach a given degree of accuracy, but this in turn would raise the question 
of what degree of precision is needed, and this in turn depends on ow 
small a difference we wish to detect. When group comparisons are made 
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and when the Ns are relatively small, the null hypothesis is apt to be 
aeeepted too often for the simple reason that a real difference has to be 
sizable before it is demonstrable by small samples. On the other hand if a 
real difference is so small that its statistical demonstration requires 
thousands of cases, we may question whether it has practical or scientific 


COMPARISON OF CHANGES 

Although the comparison of changes involves nothing new in the way 
of statistical theory, such comparisons are somewhat more complicated 
than the tests of significance so far discussed. The researcher may be 
interested in either of two questions. First, he may wish to evaluate the 
effect of only one experimental condition or, second, he may wish to 
contrast the changes produced under two (or more) different experimental 

For the first of these, a sample is selected, measurements are made prior 
to (pretest) and subsequent to (posttest) the provided condition, but, since 
changes from a first to a second measure might occur because of practice 
efiect or because of some other experience beyond the control of the investi¬ 
gator, it is necessary to set up a control group the members of which are 
measured and then remeasured, at chronological times corresponding as 
closely as possible to those of the pretest and posttest of the experimental 
group. It is presumed that all uncontrollable effects will be operating 
similarly on both groups so that any difference in change for the two groups 
will have resulted from whatever was done to the members of the experi¬ 
mental group. The statistical problem is that of evaluating the change 
shown by the experimental group compared with that shown by the 

For the second type of question the investigator starts with two experi- 
mental groups, one of which is subjected to one experimental condition 
and the other to a second experimental condition, both groups having been 
measured prior to the experience (pretest), and then again after the 
experience (posttest). Since the question is concerned with contrasting 
gams (or losses) associated with the two conditions, a control group is not 
needed. Presumably, uncontrollable factors are alike for the two groups. 
The statistical analysis consists of testing for significance the difference 
between the changes shown by the two groups. 

Whether we are dealing with a problem calling for an experimental and 
a control group or for two experimental groups, the two groups may be 
drawn at random or formed on the basis of the pairing of individuals on 
pertinent variables. If the groups are set up on the basis of pairing, we need 
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to allow for Hat fact when d.ierminmg the required it.ndard error of the 

“ZZZZEfr - « - «!*“ 1 “ 35 

“dlina .f the duf, we ha.e a total of four means. fo,The preteH and 

the poshest for eaoh of the two groups. B, using * e« Si 

scripts, 1 and 2 for the pretest and posttest, and E and C to represent * 
two groups we can specify the means as M m , M m , M C1 , and C 2 _ 

ah TpossMe difri*. between the* four will have meaning. Those 

that have meaning may be set forth as: 

n = Mr, - M e „ the change shown by the experimental group. 

T) = Mr — M C t 2 , the change shown by the control group. 
q = M„, - M C1 , the pretest difference between the groups. 

= _ Me 2 > the posttest difference between the groups. 

Which of these four meaningful differences should we test for signifi 
cance 7 Obviously, it is insufficient to test only D, : because we cannot b 
th t the shift shown even though nonchance, is really due to the 
L“ ™S«d atrieu*"n fan,. .1* Lon for haring .he control group 
is to enable us to evaluate the shift which takes place as a result of causes 
other than the experimentally provided experience. Now it mig e 
Sought that if ./is significant while D c is less, or not at all, significant 
an effect has been demonstrated. This type of comparison, however, do 

mX+Z regard, absolute magnitude, f>, - Do w.ll »lw„. equal 

T) — D it is easier to evaluate the former difference. 

D % - u wMci n /_ n __ D \ w hen the groups have 

To get the standard error of D (- D E V c ) 6 

been independently drawn we need the sampling variance of D E and D c 

as to substitute in ^ 

Sl)£ = (6 - i2) 
u is the difference between two means based 
/"meCso^ we/mild gS L standard error of by using 
formula (6.8), but since the difference between correlated means is equal 
to the mean difference, M De , we can use formula (6.7) to get_ he requ^ 

S 2 D . This same situation holds for the control group, ( • ) 

be used to get S 2 Dc . 
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our^stendar^'error* of ^ he dTrelv bJtLT* h™ ^ ^ 
tern to enable us to take “ ' 

controlled experiment Th* • g , °* . the fact that we have a better 
changes shown by the members^th ** ^ ^ C0rreIatl0n betw een the 
consider the paired chaTI we ‘° C ° mpUte il we need “> 

mental and /' to th» f ? J aad J ’ Wlth J assigned to the experi- 

change’score which is noThi^thanks ““■‘“V 
post test score. Thus the changf score for the % 

Ci = D t = ~ * 2i and t> = D = .v, _ .y. 

m™h e ,*'o"; r ““ "" 0h *”«" <“ sh °wn b, the 

D = (Cj - C,) = ( Dj - Dr ) 

= (x u - X v ) - (X lr - X 2j .) 

parentheses, thus 0C6SS * S ° mewhat sim P ,er b ? amoving the 
° = X 1 > - Yy - X lr + x 2j . 

Simply add x u and x v and then subtract the sum of X and k 

zxzssr* - »«» 

Once the N Ds have been determined, we can set M V and ti, 

Sm by formula (6.7). This M D will equal D E i d* ^2^ 
(M C1 - m C2 ), and this S Ub will be exactly the same as ( “ m) 


s D „ = Vs' 


D , 


+ S' ^'7j, /), S/t,o/j,, 


^nger t ^brmula n for a 5 , 2^ a (equivalent 5 to 0 ™^^^^ 6 T 

mrnmmm 

place of E and C. ^ t0 USC a PP ro P riate subscripts in 
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INFERENCE: ESTIMATION . 

So far we have discussed statistical inference from the point of view of 
hypothesis testing, but there are occasions when we may wish to use 
information from a sample as a basis for estimating population values. 
There are two general types of estimation: point and interval. We shall 
discuss the first briefly in order to introduce some concepts which the 
student might encounter, and the second because of its practical implica¬ 
tions. 

Point estimation. We may regard a sample statistic as an estimator 
for the corresponding population value (parameter). How “good” an 
estimator it is depends on whether or not it is unbiased and consistent, and 
on its relative efficiency. 

An estimator is said to be unbiased if the average of a large number of 
sample estimates tends to equal the parameter being estimated. The mean 
is unbiased because the mean of sample means will approach nearer and 
nearer // as we take more and more samples, but S 2 defined as is 

biased in that the mean of sample variances tends to be smaller than -the 
population variance. An unbiased estimate is g iven by s 2 = 'Lx 2 j{N — 1), 
but for subtle mathematical reasons s, or a/St 2 /(A — 1), involves a 
negligible bias as an estimator of the population standard deviation. 
Note that the bias is small when N is large. 

An estimator is said to be consistent if it approaches nearer and nearer 
the population value as sample size is increased indefinitely. All the 
measures so far discussed satisfy this criterion. 

The efficiency of an estimator is a function of its sampling error. Thus, 
in terms of efficiency the sample mean is far better than the median as an 
estimator of the central value of a population of normally distributed 
scores even though both are unbiased and consistent estimators. 

Interval estimation: confidence interval. Interval estimation, which 
takes into account the sampling error of an estimator, provides limits, or 
an interval, for the population value, and at a prescribed level of confi¬ 
dence. Given a sample mean and its standard error, one could set up a 
whole series of “trial” hypothesis values for the population mean. All 
trial hypothesis values well above and below the sample mean could be 
rejected at a high level (small P) of significance, but rejection would 
become more and more risky as we approached nearer and nearer the 
sample mean, and for a whole series of values near the sample mean all 
trial hypotheses would be acceptable. This implies that at some point 
above the sample mean and at some point below the sample mean we 
change from rejection to acceptance of the trial values. If we have adopted, 
say, the P = .05 level, the change will obviously be at M ± 196 S M . In 
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rejecting trial values outside these limits and accepting values within these 
limits, we are in effect inferring that the population value is in an interval 
defined by these limits. 

It would seem that there should be some way of expressing our degree of 
confidence that the population mean lies between the limits M ± l.96S Mi 
since, as we have seen, we can be somewhat sure that the sample mean is 
not a chance deviation from a population mean outside the limits so 
determined. Note that, given a population mean and sigma, we can 
legitimately speak of the probability of a sample mean falling in a specified 
region, but given a sample mean we cannot speak of the probability of the 
population mean being in a certain region (or interval) for the simple and 
compelling reason that being definitely just one value, has no distribu¬ 
tion. We can in no way enumerate events so as to conceive of a probability 
fraction since just one event (value) is possible. 

In order to arrive at a statement which expresses our degree of confidence, 
we note that, if we draw a second sample, we would be apt to have a 
different set of limits for the simple reason that the second sample mean 
may differ from the first. If we take additional samples of the same size, 
we would have a distribution of sample means, hence a sort of distribution 
of sets or pairs of limits, since each sample mean would provide a set. Our 
discussion can be greatly simplified by taking sets of limits given by 
M ± 2 Sm (as approximating the M ± l.96S M values). For simplicity 
of exposition, let us assume that we are drawing successive samples from a 
population having a mean of 10, and that the or and tY are such that a M can 
be taken as 2. Then M± 2a M will be¥± 2(2), or M ± 4. It will also 
facilitate our exposition if we think of the random sampling distribution of 
means in terms of intervals of ia distances on the base line with the 
approximate percentage area for the several intervals, as shown in the top 
curve of Fig. 6.1. 

Now each possible sample mean will lead to a lower limit of M — 4 and 
an upper limit of M + 4. If we consider the 19 per cent of sample means 
expected between 9 and 10, we see at once that these 19 will lead to intervals 
with lower limits between 5 and 6 and upper limits between 13 and 14. 
That is, the sample means falling between 9 and 10 will generate that part 
of the lower limit (. LL ) curve of Fig. 6.1 between 5 and 6 and that part of 
the upper limit {UL) curve between 13 and 14. Likewise the 15 per cent 
of sample means falling between 8 and 9 will lead to the 4 to 5 part of the 
LL curve and to the 12 to 13 part of the UL curve. Similarly, as can be 
seen by careful study (a requirement for most students if understanding is 
to be achieved) of the three curves of Fig. 6.1, every left-hand segment of 
the top curve generates a left-hand segment for each of the bottom curves. 
Stated differently, the left half of the top curve leads to a distribution of 
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Fig. 6.1. Generation of confidence limits. 

intervals with lower limits less than 6 and upper limits of less than 14. In 
exactly the same fashion it can be seen that the right half of the top curve 
leads to the right half of the LL curve and also the right half of the UL 
curve. Thus we have a sampling distribution of intervals (sets of limits) as 
found by taking M ± 4 (or M ± 2cr^). Our next task is to ask how many 
of these various intervals actually include 10, or the population mean. 
Reference to Fig. 6.1 will verify that, out of 100 tries, we would expect to 


get: 


4 times an interval with LL of 2 to 3 
9 times an interval with LL of 3 to 4 
15 times an interval with LL of 4 to 5 
19 times an interval with LL of 5 to 6 
19 times an interval with LL of 6 to 7 
15 times an interval with LL of 7 to 8 
9 times an interval with LL of 8 to 9 
4 times an interval with LL of 9 to 10 


and UL of 10 to 11 
and UL of 11 to 12 
and UL of 12 to 13 
and UL of 13 to 14 
and UL of 14 to 15 
and UL of 15 to 16 
and UL of 16 to 17 
and UL of 17 to 18 


Notice that for every set of limits in the foregoing groups the population 
mean is in the range or interval defined by the upper and lower limits of the 
set. When we sum these expected frequencies, we see that 94 per cent of 
the sets of limits lead to intervals within which the population mean lies. If 
we had not rounded to the nearest per cent, these would sum to 95,45 per 
cent. This implies that 4.55 per cent of the time the intervals so defined 
would not include the population value. This can be verified by noting 
that sample means of less than 6 (top curve) lead to upper limits of less than 
10, and do so 2.27 per cent of the time, whereas sample means of more 
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than 14 produce lower limits of more than 10 about 2.27 per cent of the 
time. These percentages are for the tails of the bottom curves, to the left 

of the ordinate at 10 for the UL curve and to the right of this ordinate for 
the LL curve. 

In summary, if we were to make in our lifetime 100 inferences concerning 
population means on the basis of sample values by each time taking the 
limits as M ± 2 a Mt the limits so established would include the population 
value about 95 per cent of the tries. That is, in the long run we would be 
correct about 95 per cent of the time in concluding that the population 
value is within the intervals so determined, and about 5 per cent of the 
time we would be in error. If we used M ± 1.96^ for setting limits, we 
would be correct 95 per cent, and in error 5 per cent, of the time. When we 
take M ± 1.96a M as a confidence interval, the degree of faith in such limits 
is represented by a P of .95; i.e., the level of confidence for such an infer¬ 
ence is represented by a probability-type figure of .95. If we wish to be 
surer of our inferences, we might choose the .99 level of confidence, which 
in practice can be attained by taking M ± 2.58<x M as limits. 

The limits set by the confidence interval method are so very similar to 
fiducial limits , and the level of confidence, sometimes referred to as the 
confidence coefficient , is so much like fiducial probability that the beginning 
student can well let the mathematical statistician worry about the theoreti¬ 
cal difference between what seems to be two ways of doing the same thing. 

The preceding illustration of the meaning of interval estimation was 
based on a presumed known cr; in practice we will have a sample estimate, 
S ’ hence S m as a basis for calculating limits. Since S M will vary from 
sample to sample (because of varying Ss), the width of the interval will 
vary from sample to sample and, therefore, it might be inferred that using 
. =*= 1 - 96s m would not lead to intervals that overlap p 95 per cent of the 
time. But since the width of the interval will sometimes be too short and 
sometimes too long, there is a balancing effect for N not too small. 

Confidence intervals can be set up for statistical measures other than the 
mean, but if the random sampling distribution of a given measure is 
nonnormal, the method will not be the simple stunt of taking C ± l.96S c 
or C ± 2.58^ where C stands for any statistical measure. It should be 
obvious that, since the standard errors for all statistical measures are a 
function of JV, it is possible by increasing the sample size to narrow the 
confidence interval without any loss in the degree of confidence with which 
we accept the limits. 

Confidence interval for a difference. There are times when it is desir¬ 
able not only to know whether a difference is significant but also to specify 
limits for the population difference. Such specification does not presume 
that a significant difference has been found. Even when a difference fails 
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to reach significance, the specification of confidence limits gives some idea 
of the possible difference between population values, and such information 
may help answer the nonstatistical question of whether the population 
difference is apt to be large enough to be of practical or scientific impor - 
ance. This procedure may be helpful in evaluating the consequences o 
accepting the null hypothesis when in reality the hypothesis is false. 

Furthermore, the setting up of a confidence interval may be particularly 
helpful when we have obtained a difference which is highly significant. 
Consider the case of a difference of 4.78 inches in mean height between 
men and their sisters. Because of large Ns and the presence of brother- 
sister correlation, the standard error of the difference is very small; its value 
is about .07. When we compute D/Sp we have a z of 68. This wopld, 1 
we could evaluate it, yield a probability, for as large a difference by chance, 
which would be so microscopically small that we could not comprehend it. 
However, when we set confidence limits at, say, the .99 level, we have 
4.78 ± 2’58(.07), or 4.60 and 4.96, as limits for the population difference. 
This permits a down-to-earth way for evaluating the obtained difference. 

Level of confidence vs. level of significance. The term “level of confi¬ 
dence” should not, as it frequently is, be misused in place of level ot 
significance.” The first term pertains to interval estimation, the other to 
hypothesis testing. 


QUESTION OF ASSUMPTIONS 

It may be well to consider briefly the assumptions underlying the pro¬ 
cedures so far discussed for making statistical inferences, since assumptions 

restrict the applicability of a method. 

Independence of sampling units. It is assumed that the conditions 
of random sampling hold, but the frequency with which the requirement of 
independence is violated by researchers suggests that a warning is needed. 
The violation usually comes about when multiple measurements or 
observations are made on each of the individuals m a sample a*d each 
measurement (or response) is treated as a sample value, thereby inflating 
n -fold times when n repeated measurements (or responses) are available tor 
each person. The lack of independence comes about in that, for instance, 
if the sample of individuals happened to include one high scoring person 
there would automatically be n high scores. The effect of such an inflation 
of N is an illegitimate reduction in standard errors. 

Infinite vs. finite universe. If we are sampling from a finite universe, 
particularly a universe with a rather small number of cases, it seems 
reasonable to think that as the sample size becomes large relative to the 
number of cases in the universe, the sample mean, for example, will tend 
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to fluctuate less from the universe mean than it does when we are drawing 
from an infinite population. This suggests that the standard error formulas 
need to be modified for the finite population situation. The required 
modifications are available for only a few statistical measures. If we let N 
represent the sample size and N mp the size of the finite universe the 

fonows^ erf0rS f ° r thS mea " and f ° r a pr0portion are approximately as 
' S - - 




JpqlN 


V - 1 ~ n/n P0] , 


In a given research it is sometimes difficult to decide whether the universe 
being sampled is finite or infinite in size, and, if finite, it is not always easy 
to determine the value of N„,. It might be argued that psychologists never 
study an infinite universe. It can readily be seen, however, that the correc¬ 
tive factor in the sampling error formulas becomes negligible when N is 

Thus ’ lf Nr "“> 1S known to be large relative to ;V. it matters little 
whether the given universe is wrongly conceived as being infinite For 
example, when N is .01 of iV„„ the corrective term leads to a reduction in 

formulas' mg err ° r ^ ab ° Ut '°° 5 ° f ** Va ' Ue ° btained by the ordinary 

These formulas for the finite universe situation are frequently useful 
when we wish to compare a subgroup with a total group which contains the 
su group. Such a comparison is sometimes erroneously made by taking 
VS\/N t + S\/N s as the standard error of the difference between the 
subgroup mean, M s , and the total mean, M t . This makes no allowance for 
the fact that the two means are not based on independent groups An 
appropriate procedure is to regard M s as based on a sample drawn from a 
nite universe of N t cases with mean and standard deviation of M, (as «) 
and a ,; then with the standard error of M, taken as 


we can test the significance of the deviation of M, from M t by using the 
ratio (M, - which is interpretable as a z. This ratio will give a 

very close approximation to the z which would be obtained if we were to 
compare the subgroup with the remainder (the total cases less the subgroup) 
as two independent groups, using the usual formula for standard error of 
the difference. The foregoing scheme would also be applicable in case pro¬ 
portions instead of means were the descriptive measures used as a basis for 
comparison. 

Skewed distributions. The standard error formulas given in this 
chapter assume normal or nearly normal score distributions for the popula¬ 
tion being sampled. Skewness is the most frequently encountered evidence 
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for nonnormality, and accordingly it is of interest to considM the of 

skewness on the sampling distribution of the mean, the measure most apt 
to be involved in testing hypotheses. The relationship between the degree 
of skewness, gl , for a variable and the_amount of skewnesss for the sampling 
distribution of means is gu = gjVN. Thus the skewness m *e distribu¬ 
tion of means rapidly disappears as N is taken larger and g • 
exampleTe” is .77 (see Fig 3.1, p. 27) and IV is 35, the skewness for the 
sampling distribution of means will be only .13 (see Fig. . a § alt 0- 
Accordingly the procedures in this chapter may be safely.used with 
moderatefy skewed distributions when N is large and with markedly 
Tewed distributions when N is very large. Some methods for handling 
nonnormal data will be discussed in Chapter 19. 

A FURTHER WORD ON PROPORTIONS 

The student will have noted that the general principles of statistical 
inference set forth in Chapter 5 have been utilized and extended in the 
present chapter. There are many points of obvious similarity m t e wo 
chapters but there is an additional parallelism which is not obvious. Foi 
an Eute involving a dichotomy such as yes-no, like-dislike, pass-fail 
ete. we may arbitrary assign a score of 1 to one category and a score of 0 
to the other. That is, X — 0 or 1. 

Table 6.2. Scheme for mean and standard deviation of a dichotomous variable 

X / fx f X * 


Response 


Yes 

No 

Sums 


a 

fo 

N 


/dD 

/o«» 
/id) 
= zx 

-A 


m? 

/o(01 2 

AO) 

= ST 3 

=A 


Let f, and f x stand for the frequency of, say, no and yes respo 
respectively in a sample of N cases. Thus we have a miniature frequency 
distribution, with the two categories being analogous to two mtervalS v 
Let us consider the mean and standard deviation of this nimiafure f«qu^ 
distribution, both in terms of gross score formulas. Notice that in TaWe 
6 2 we have a score column, X, a frequency column /, an fX and a J 
column (analogous to fd and// 2 , with d= X\ ^ wiU be seen that SJ /, 
hence the mean of the distribution is M = ZX/N -fJN - p, whe p 
the proportion of yeses. Hence a proportion may be regarded as a mean. 
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It will also be seen that 'EX 2 =f 1 ; hence when we utilize formula (3.7) 
to write the variance of the distribution we have 


s 2 = - (SX) 2 ] 

= CYA - (A) 2 ] 


NA _/Y 

-A 2 N 2 _ 


Hence 5 _ Vpq as the standard deviation of the dichotomous distribution 
(Any connection with the binomial?) 

In this chapter we have given S M = S/Vn as the standard error of a 
mean. thisjiolds for t he dichotomous distribution we would have 

Sv w — \ Vp ?J N ' But thiS is the Sarae as g iven by formula 

(5.2). This 1S as it should be since^ = M for the dichotomous distribution, 
urthermore, formula (5.5) for the standard error of the difference 

™ Tn n 87 C f rre f e t& r ° P0rti0nS haS itS anal0gUe in the development on 
pp. 80-82 for the difference between correlated means, and formula (5 61 
involves a pattern similar to that of formula (6.9). 


NOTE ON THE PROBABLE ERROR 

An antiquated procedure is the use of the probable error, pe, instead of 

674^ nda a e i r ° r f connectlon wi * sampling. The pe of the mean is 
‘Z f ’ a “d therefore we would expect 50 per cent of successive sample 

ans to fall between p ± pe M . Similarly, thepe for any other statistical 
measure is .6745 times its standard error. The student who attempts to 
survey the research literature on a given topic is apt to encounter pes and 
he therefore must know the relationship of the pe to the standard error. 

NOTE ON NOTATION 

We have used the Greek letter p as the symbol for population mean and 
corresponding Latin letter M for a sample mean. Another frequently 
used symbol for a sample mean is JT (read ATbar); later in this text we wiU 
use the bar to indicate a sample mean. The student needs to know both M 
and A as symbols. We have used n as a symbol for the standard deviation 
of a population and also for the standard deviation of a theoretical distri¬ 
bution, such as the binomial or (the definition formula of) the normal 
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curve; the Latin equivalents, S and j, stand for sample standard deviation 
one biased the other unbiased. As shall be seen, we need in the sequel 
both S and 5 . Consistency in notation would call for p (or P) as a sample 
proportion and the corresponding Greek tt as a population value, but 
use of 77 was long ago taken by mathematicians as the symbol for something 
else, so to avoid confusion we used p m instead of tt. Later we will user 
and r „ as symbols for sample and population correlation coefficients 
because p (rho) has, as we shall see, been used to signify a particular kind of 
correlation coefficient. 



Chapter 7 

SMALL SAMPLE OR 
t TECHNIQUE 


Alth° ugh the general principles of statistical inference are the same for 
both large and small samples, the techniques differ. We shall confine our 

with the difference between two means. Chapter 14 will deal with infer¬ 
ences concerning variabilities. 1 

It will be recalled that the sampling distribution of the mean is normal 
when the trait distribution is normal. This holds regardless of sample size 
sampling distribution of means centers at the population mean with a 

SdlTd eTrt^oTli 0 " ^ = a L VN ’ WWch Si « ma We termed the true 

(M -S/rr L? ^ mean ' Ca " alS ° that the relative devia tes, 

drawn andt V Tr™ 1 CUrVe ' When successive samples are 

drawn and a S M is computed for each sample by using the samnle 

mstead of <r (an unknown), the ratios of given (If - ^to thdf / 

values so computed will be distributed normally for very large As and an 

proximately so for As of moderate size, but for As a! small as 3 0 th 

approximation is none too good. The value 30 is arbitrarily chosen- 

the approximation to normality becomes progressively worse as we go from 

large to small As rather than becoming abruptly wo'rse in the vHnilJ of 

whereas"!^ S/IA^ ^ = ^ Suffers from bias > 

2 j(N 1) is an unbiased estimator of the population 

variance. Since the bias in S' increases with a decrease in A, it ^important 

to use the unbiased estimator when N is small. We will accordingly use 

estimate' of SlVlf ’ “ * nonne S li g ibIe improvement in 

the estimate of the standard error of a mean based on a small sample. 
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Fven so the successive sample ratios, (M - /*)/**, with ** computed 

«“ of «”" ‘«,r‘" 0 W, "Sabi numerator which i. 

Srsasi^ 

about zero but will be leptokurtic. That is, it is characteristic of the 

Thus there will be relatively more large ratios. . 

^ The t distribution. It can be shown that such ratios, involving a 
normally distributed deviate divided by an unbiased,«t«matet of its sam¬ 
pling error, will follow the so-called t distribution, defined by 


y = 


W) 



r(- 


1 + 


^2\~(n- 

nJ 


• l )/2 


mr 


comprehension of most students, it should be noted that, ts he he,girtofV 

Eat "hire will he no. jus. one bn, many distribution, of t, one for each 
P ”i™ V ,T.Ew”. the carve of t, -hen , - 7 and when » - 3, as ccrn- 

the valves oft, for ns of 1 to 30, which will be «! 

proportionoftimes. Thus to, „ - 30 we see torn T.bhE that tinf •»> 

Point» a. - - J f “ Ti« EEdis ai 

flE'EE^tSa. 3 IfEr » - .o, a. compared with 2 38 for 

ofTEonr. The , of the equation for r, and in the , table 

is the number of degrees of frardom (if) ‘“’“"'Enhf “n Ed or 
population variance. The if depend, on how many of ^ 

are “free to vary.” Suppose two scores, 3 and 5. Their 
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mean is % ana tne sum of 


ueviationsj 


-*- - - v — m )= Si - _ sm= 

;Jllr sn; r i r tfr - is * 

z'tiz'Z - 7 - - »■ 

ZTf-r.VJ'T'r i t 

scores, 3 4, and X, which yield a mean of 4. The deviations must satkfv 
the that they sum to zero; i.e., (3 - 4) + (4 -“) + a - 41 

• . us one ° f the three deviations is fixed by the other two ie is not 

md pendent of then values, because the three deviations musT^m to zero 

tha Tx “ 0r and n ? tening ‘T* ^ Symb ° 1S f ° r scores ' Su PP ose 

15 ^ 2 ; A *’ and X 4 represent four scores, and it is reported that tLv 

> ® - H ~ <* ■*= f«. deviations c» « , S 

+ W - «“ ” e « -*» + «-«) + o-* - 

' v -^4 4U) as a sum which must poupiI 7f*m t+ ^ 

imposed. V ^t^lly, r tWrres?rictio S n 1 comes' 3 about e becau^ \ve are^taking 
at hand nS TL 0U jf 0 f ne COnStant ’ f the mean ’ computed from the set of score! 
aLlyfjy SUm ° f SqUa f S < of Nations) about a mean i 

tlJTf J h 65 are used t0 C01 "pute the mean. In general 

the MmbefofT,, t SqUareS 1$ t0 the number of squares minus 
d ata- stnctions imposed by constants computed from the 
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Note that the unbiased estimate of the population variance, s 2 
= S x 2 j{N - 1) involves dividing by df the number of degrees of freedom. 
This is a general rule. 

Computation of ,v 2 or ,v. For N small the mean and .r 2 or 5 are readily 
computed from gross score formulas. Thus M = SX/iV. To compute s 
or 5 we need Err 2 in terms of gross scores. This was given earlier as 


S* 2 = - [N2X 2 

N 


(S Xf] 


(3.6) 


Division of this by N - 1 yields s 2 , the square root of which is the required 

N S 2 . 


S. 


An easily derived relationship between s 2 and S 2 is s 2 - 


N- 1 


Although we do not need a frequency distribution for purpose of compu¬ 
tations, a distribution should be made anyway so as to permit at least a 
rough check on the assumption that the scores have been drawn from a 
normally distributed population of scores. 

t for a single mean. We can test the significance of M as a de ™ atl0n 
from anv hypothesized value for the mean, M h , by taking t — (M 
as an entry in Table E, with n = df = N - 1, to see whether the obtained 
t reaches the t value required for certain levels of significance. If the t does 
not reach the value required for the chosen level of significance the devia- 
tion would be attributed to chance and the hypothesis accepted. 

If we wish to specify the confidence limits for the unknown population 
mean and to do so with a level of confidence indicated by P - .99, we first 
note from the table of t how large t must be, for the given df, to correspond 
to the 01 probability level. Then M plus and minus the t, so found 
times will give the desired limits. For example, suppose nine cases yield 

a mean of 80 and a sum of squares of 1152. Dividing the sum of squares y 

df or 8 we get s 2 = 144, s = 12 as an estimate of a and s M = 12/V9 == 4. 
For 8 df wf find from Table E that t = 3.355 for the .01 level Then 
80 ± (3.355)(4) gives 66.58 and 93.42 as the .99 confidence limits for the 
population mean. If we used the large sample method of Chapter 6 we 
would have S 2 = 1152/9, giving S as 11.31, from which we would get S M 
— 11 31/V9 = 3.77. Since for the normal distribution a relative deviate 
of 2 575 corresponds to the .01 level, we have 80 ± (2.575)(3.77) or 70.29 
and 89.71 as the .99 confidence limits for the universe mean. These values 
for the confidence interval differ appreciably from those obtained pre¬ 
viously when proper allowance was made for the smallness of the sample. 

Difference between correlated means. It will be recalled that when 
we have two means based on the same individuals or on paired cases, the 
test of significance of the difference must make allowance for the f^ct that 
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the two sets of scores are not random with respect to each other. In 
Chapter 6 we saw that this could be done by including the r term in the 
standard error of the difference, as in formula (6.8), or by working directly 
with the differences between paired scores. It was shown that M D = D M 
and that S Mj) = S Dm . When we have small samples, it is easier to work 
with M D , an estimate of the c of the distribution of differences between 
paired scores, and thence s Mj) . To get the best estimate of the sampling 
error of M D , we need the sum of squares of the deviations of the pair 
differences from the mean difference, i.e., 2(D - M D f, which when 
divided by the proper df or N - 1, where AGs the number of differences 
or the number of paired scores, gives the best estimate of the variance of 
the universe distribution of differences. Let s* D stand for this estimate 
Then 

= 7t 

The computation is straightforward. Each of the Ds is the difference 
between two scores, the subtraction being made in the same direction for 
all, and the sum of squares, 2(D - M D )\ is obtained by formula (3.6) 

with the Xs replaced by Ds; that is 2(D - M D f = J. jyy2D 2 - (2D) 2 ]. 

The Ds are summed algebraically, and their squares are summed After 
s ** h f been cal ™lated, we get / as M d /s M j) , The hypothesis to be tested is 
that the universe value of M D is zero; the table of / is entered with the 
obtained t and with df = N - 1 in order to see whether it reaches a pre¬ 
scribed level of significance. Note that the df is 1 less than the number of 
Ds, not 1 less than the total number of scores (see “Further note” on dfs 
p. 104). J ’ 

The assumption of normality pertains to the Ds; hence, again, even 
though a frequency distribution is not needed for computational purposes, 
it should be made so as to provide a rough check on the assumption. A 
confidence interval for M n (and consequently D M ) can be set up in precisely 
the same manner as indicated previously for a single mean. 

Difference between independent means. Given: two groups of N 
and N 2 cases, and that we wish to test the significance of the difference! 
d m = M i ~ M 2 . By the procedure of Chapter 6 for large Ns, we would 
make the necessary calculations for determining D M jS D or z. As an aid 
to transition in thought from z to t , let us first write the expression for * 
thus, ’ 

z = Rm = M t - M, = M t - M 2 
s d m \!s 2 Mi + S 2 Mi js\ s\ 

*n,n 2 
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which involves the two sample variances. Now, for the small sample 
situation, we need t — D m /s d where s Dm is to be the best possible estimate 
of the standard error of the difference. To get this we apparently need the 
best possible estimates of the two variances of the two populations from 
which the samples have been drawn. But here we encounter an assumption 
underlying t for this situation: the two populations must have the same 
variance. Hence, we need just one estimate, an estimate of the variance 
common to the two populations. Calling this estimate s 2 , by analogy with 
the 2 technique, we need 



The best estimate, ^ 2 , of the common population variance is obtained by 
computing the sum of squares separately for the two samples, then com¬ 
bining these sums, and dividing by the proper df or 

* +W - 

N 1 + N 2 ~ 2 ' 

The two separate sums are computed by formula (3.6). Note that 2 
degrees of freedom are lost because the sum of squares is about two means, 
which leads to two restrictions. Substitution of the obtained s 2 in the 
foregoing expression leads to a t, which is looked up in Table E with df, 
or n, equal to N ± + jV 2 — 2 in order to see whether it reaches a chosen level 
of significance. 

There is one point in the method of determining the s 2 , needed for testing 
the significance of the difference between means, which may have puzzled 
the student. The setting of the null hypothesis, in combination with the 
assumption of equal population variances, implies that the two samples 
have been drawn from a single universe or from two universes which have 
the same mean and equal variances, for the given and measured trait. It 
might accordingly be assumed that the best estimate of the population 
variance would be obtained by taking the sum of squares about the com¬ 
bined mean rather than about the separate means. The former would give 
the better estimate of the variance if it were actually known that the two 
universe means were the same (or that only one universe was involved), 
but there is always the possibility that the two universe means really differ. 
If they do differ, the taking of the sum of squares about the combined mean 
would, in general, yield too large an s 2 for the simple reason that the real 
difference between groups would be contributing to the variability of the 
two groups combined. (The student who has difficulty seeing this point 
should imagine what would happen to the variance of scores when two 
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groups markedly different in means were combined.) It follows, therefore, 
that in the long run the best value for s 2 will be provided by summing the 
sums of squares about the two means. 

The procedure for setting a confidence interval when we have indepen¬ 
dent means is no different from that for correlated means. Simply take 
D m ± t a s D where t a is the f, for the given df required for significance at 
the P = °oc level. This will give limits for the P = 1 - a level of confidence. 
Suppose we wish the .99 confidence interval; this requires an a of .01, or as 
sometimes written, t a = t 01 where t 01 is found under the P = .01 column, 
opposite the df. 

Further note on degrees of freedom. Suppose two independent groups 
with N x = N 2 = N, and also two groups of scores based on N cases (or N 
paired persons). For the former the df is A^ + A^ 2 — 2N — 2, 
whereas for the latter the dfisN-l even though in the paired situation 
the total number of persons is 2N. This may be (and has been) confusing 
to kome; it seems as though the obviously better plan (matching) leads to a 
loss in df compared to the setup involving independent groups. It is 
sometimes argued that the df would perhaps be larger if we worked not 
with the difference scores but with the two sets of scores in terms of the 
sums of squares of deviations for each set and the sum of cross products 
since, as can be seen from p. 81, 

2(£> - M D f = + Zt 2 2 - 22^*8 

The df fox the left-hand sum of squares is obviously N - 1, and since the 
right-hand side of the equation is merely an algebraic variant of the 
left-hand side, it does not seem reasonable to believe that the dfs will differ 
for the two sides. Note that if we consider St 2 ! as having N — 1 degrees of 
freedom, we cannot have any more degrees of freedom for the other sums 
on the right side because the x 2 values are not independent of the x x values; 
they (the x 2 scores) are not “free to vary.” 

Comparison of changes. In Chapter 6 (p. 86) we discussed the pro¬ 
cedures for testing the differences between changes shown by two groups. 
For the situation involving paired persons, a D for the difference between 
changes for the members of a pair was defined (p. 88), and the test of 
significance involved computing, for Ds so defined, an M n , S D , and thence 
S M . For the small sample, or f, technique we need s D and s Mj> , just as 
given previously for correlated means. The <^is 1 less than the number of 
pairs. For the setup involving the changes for independent groups, we 
would need an s Dj) instead of the of (6.10). The required s Dd is given by 


s = + ^ 

Dd « Ng N c 


m 

in which 
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2(D - M Dr f + S(P - Mj/ 

Njj + Ne-2 

with the subscripts £ and C referring to experimental and control groups. 
Thus the procedure for testing hypotheses involving changes for two 
groups is precisely the same as that for testing the difference between, two 
independent means, discussed previously —X is replaced by D, a difference 

score. . . . 

One-tailed versus two-tailed test. Our discussion of the t technique so 

far has been in terms of the t value needed for a two-tailed test at a given 
level of significance. If the hypothesis to be tested or the decision to be 
made logically warrants a one-tailed test, the t required for significance at 
the .01 level would be found under the .02 column of Table E, and for the 
.05 level the .10 column would be used. Those who do not wish to be 
restricted to the P levels given in Table E will find for dfs up to 20 the P 
associated with any t in Table XLY of Peters and Van Voorhis’ Statistical 
procedures and their mathematical bases. This table gives one-tailed values, 
which need, of course, to be doubled for two-tailed tests. 

Question of assumptions. When we use the tabled values of the t 
distribution as a basis for judging significance or for setting confidence 
limits we are in effect presuming that some quantity, usually a ratio such 
as (M - M h )ls M or M d /s Md or (M, - M 2 )/*^, will in the sampling sense 
follow the t distribution. The mathematical proof thereof is based on 
certain assumptions: normality for the population of X scores and of D 
scores for the first two ratios, and normality of Xs for both populations 
with common, or equal, variances for the third ratio. Whether or not 
these assumptions hold will usually be unknown. ^ 

It might be thought that the assumption of normality underlying the use 
of t could be tested on the basis of the sample (or samples) at hand either 
by testing the departure of g x (skewness) and g 2 (kurtosis) from zero (or by 
a chi square technique, discussed in Chapter 13), but these methods of 
testing for normality are not sensitive enough to lead us to reject, on the 
basis of a small sample, the hypothesis of normality unless the departure 
therefrom is very marked. Likewise, the as yet undiscussed test (see 
Chapter 14) for a possible difference between variances is too insensitive 
when used with small samples to lead to rejection of the hypothesis of 
equal variances unless the difference between the two universe variances is 
sizable; hence it is difficult to be sure that the assumption of equality of 
variances is tenable when two groups are being compared by the t tech¬ 
nique. The foregoing statements are, of course, based on the proposition 
that by statistical methods it can be proved, at a desired level of significance, 
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that a sample distribution did not arise from a normally distributed universe 
or that two universe values are different, but such methods will not prove 
normality nor prove that two universe values are identical. 

Since it is difficult to be sure that the assumptions will hold for a given 
batch of data, the question may be raised as to the effect of violations of the 
assumptions. Will too many or too few calculated ts reach the tabled 
value for the .05 or the .01 levels of significance ? Or stated differently, does 
the chosen level of significance actually represent the probability of 
making the type I error? Over the years there have accumulated both 
mathematical deductions and empirical evidence indicating that the t test is 
“robust” under violation of assumptions; that is, calculated ts tend to 
follow closely the t distribution. There are exceptions to this rule, as is 
shown by the recent empirical study by Boneau.* 

Boneau, with the indispensable help of an electronic computer, calcu¬ 
lated 1000 ts for the difference between independent means for each of 20 
different combinations of conditions with regard to TVs, shapes of distribu¬ 
tions in the “universes,” and equality or inequality of universe variances. 
The percentage of the ts reaching the .05 and the .01 levels is indicative of 
the disruption produced by specified violations of assumptions. 

First, differences in variances (us of 1 and 2, or one population variance 
four times that of the other; both distributions normal) for Ns very small, 
5 and 5, produced about 1 per cent too many ts at the .05 and the .01 levels, 
but for Ns of 15 and 15 the discrepancies were only one tenth of a per cent. 
With samples of size 5 from the universe having the smaller variance and 15 
from the universe having the larger variance, too few reached the .05 and 
the .01 levels—the .05 level being reached only .01 of the times and the .01 
level only .001 of the trials. But when 15 cases and 5 cases were drawn, 
respectively, from the universes having the smaller and larger variances, far 
too many calculated ts reached “significance”—16 per cent at the 5 per 
cent level and 6 per cent at the 1 per cent level. The moral is clear: if we 
suspect that the variances may be unequal, we should make the two sample 
sizes equal or nearly so. Presumably, the disruption of the t test will 
depend on the relative magnitude of the two universe variances—the 
larger the variance difference, the greater the disruption. In psychological 
research, when sample sizes are large enough to permit any firm statement 
about a difference between ns, it is rarely possible to conclude that the u 
for one population is twice that of the other, the ratio of the crs in the 
Boneau study. 

< Second, when sampling from platykurtic (actually rectangular shaped) 
distributions Boneau found negligible effects, but when sampling from 

* C. A. Boneau. The effects of violations of assumptions underlying the t test. Psvchol 
Bull, 1960, 57, 49-64. 7 ' 
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markedly skewed (J-shaped; gl = 2.0) distributions, Ns equal he found 
that 3 and 4 per cent of the (s reached the 5 per cent level and too few 
reached the 1 per cent level. It is comforting to know that such extreme 
skewness, rarely encountered in practice, will not lead to too many 

§ Third, although the foregoing results hold for both one- and two-tail 
tests Boneau found that when one sample was from a J-shaped distribution 
and the other from either a normal or a rectangular distribution, the 
distributions of the resulting calculated ts were skewed: a doubling of the 
risk of falsely concluding that the mean of the J-distribution is lower than 
that for the rectangular, also normal, distribution; conversely, significant 
differences in the opposite direction occurred only half as often as expected 
from the theoretical t curve. These results should give pause to the advo¬ 
cates of one-tail tests; they also have obvious implications for two-tail 
tests even though the number of is, irrespective of direction, exceeded only 
slightly the expected number. 

Suppose that in one study the difference between two means for two 
small samples leads to a t which falls at the .01 level and that m another 
study two large samples yield means, for another trait which are also 
significantly different at the .01 level. Can we place as much reliance on the 
first difference as on the second? The answer is yes, provided the two 
studies have been carried out with the same degree of^care as regards 
controls and adequate sampling techniques, and provided it is safe to 
presume that the fundamental assumptions underlying t are tenable. Thus 
our confidence in a result based on small samples is a function not only ot 
the probability level of significance attained but also of our faith that 
assumptions have been met. Since, as we have suggested, the conditions 
of trait normality and equality of variances are exceedingly difficult to 
demonstrate when the only information available is based on the small 
samples at hand, we are forced to conclude that, in general, we cannot place 
as much reliance on the results from small samples as on those from large 

S£l Although this last statement can, in light of Boneau’s results, be 
qualified, we still have the question of the place of small samples m psycho¬ 
logical research, and about this there will be a diversity of opinion. We 
do 5 not propose to settle the issue or even debate it; instead, we shall 
mention a few points which we feel are pertinent. There are, of course, 
types of research for which it is impossible or practically impossible to 
secure more than a few cases, either because of their scarcity or because of 
prohibitive costs. For such situations it is fortunate that the small sample 
or t technique, which permits some allowance for the smallness of the 
sample or samples, is available. Quite frequently small samples may be 
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useful in a preliminary study carried out solely for the purpose of guiding 
the experimenter. If given hypotheses seem to be verified, the next step 
should be to secure more cases for further verification rather than to rush 
into print with positive conclusions. 

It seems to the writer that those who publish statistical results based on a 
small number of cases should, unless they are positively sure that the basic 
assumptions underlying t have been met (and this assurance can seldom be 
attained), adopt a more stringent level of significance than they would 
adopt if they had large samples. Admittedly, a more stringent criterion of 
significance means that the null hypothesis may be less frequently rejected 
and consequently that a real difference may be overlooked. At this point 
some readers may need to be reminded that the best way to avoid com¬ 
mitting type II errors is to avoid the use of small samples: the greater the 
number of cases the greater the likelihood of detecting a difference. 

An illustration of the fact that small samples are not conducive to 
rejection of the null hypothesis unless the difference between universe 
values is sizable may be in order. Let us suppose that the means for the 
heights of two populations are 64.5 and 68.0 and that the universe standard 
deviations are both equal to 2.7. An investigator who does not know these 
facts draws a random sample of eight cases from each universe; and in 
order to help him a little (and also simplify this discussion), we tell him that 
eac a - 2 .7. The standard error of the difference between means becomes 
2 .) + i or 1.35. If the investigator accepts the .01 level of significance 
it is immediately apparent that an obtained difference would have to be 
at least (2.58)(1.35), or 3.48, for him to reject the null hypothesis. (Why are 
we justified in using the normal deviate, 2.58, with such small samples?) 

A little consideration of the fact that the sampling distribution of differences 
between means will center at 3.5 indicates that the chances are nearly 
50 50 that the investigator will be accepting the null hypothesis even 
1wugh the real difference is more than a standard deviation in magnitude 
There are times when an investigator may be so anxious to accept the 
null hypothesis that he will seize upon a very high level of significance in 
order to better his chances for accepting the hypothesis of no difference. 
Another way for increasing the odds in favor of accepting the null hypo¬ 
thesis is to use exceedingly small samples. Now those who desire to claim 
that no difference exists must face the simple fact that such a proposition 
can never be proved on a sampling basis. The most convincing way to 
demonstrate that a difference is of no practical or scientific importance is 
to use large samples and the confidence interval method for specifying 
limits for the population difference. 6 



Chapter 8 

CORRELATION: INTRODUCTION 

AND COMPUTATION 


One of the chief tasks of a science is the analysis of the interrelations of 
the variables with which it deals. In the physical sciences, and frequently 
in the biological sciences, the interrelations can be determined by noting 
how much of a change in one variable is associated with change in another. 
The physicist studying the relationship between temperature and pressure 
exerted by a gas can vary the former at will so as to determine the pressure 
at different temperatures. In the social sciences, and sometimes in the 
biological sciences, the variables studied are apt to be characteristics of 
individuals (plant or animal); thus to study relationships the experimenter 
is compelled to make measurements on several individuals. For example, 
if two variables such as height and weight are under consideration, the 
measured height and weight of N individuals will provide N pairs of 
observations from which it can be determined whether the two vary to¬ 
gether. In either case it is important to determine the form (mathematical) 
of the relationship and the accuracy with which it is possible to make 
predictions. 

Many relationships are expressible in terms of the simplest of all 
mathematical forms, Y = A + BX, in which X and Y represent variables 
and A and B are constants determinable from the observations. The 
accuracy of prediction can be determined, and it is convenient that we have 
some general measure of this accuracy. One such measure which can be 
computed and which will yield information as to the degree of accuracy and 
the degree of relationship is the correlation coefficient, designated r. This 
measure of co-relation, as we shall soon see, not only tells us the degree of 
relationship, but will also, in conjunction with the two means and standard 
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deviations, permit us to write the linear equation for predicting F from X 
or X from Y. 

Our present discussion will be concerned with the determination of 
relationship between such typical variables as height, weight, strength, age, 
intelligence, social status, attitudes—i.e., with those variables which show 
variation from individual to individual. The question of the relationship 
between variables of this type can be stated quite simply: Is there a ten¬ 
dency for the individual who ranks high (or low) on one characteristic to be 
high (or low) on another also ? It should be noted that at times a relation¬ 
ship may involve just one variable: Are heights of sons related to the 
heights of their fathers? Are the IQs of adults related to their childhood 
IQs? 

THE SCATTER DIAGRAM 

The first task is that of tabulation. If we have observations on the height 
and weight of a large number of individuals, using cross-sectional or 
coordinate paper, we can lay off on the y axis convenient tabulating 
intervals for, say, height and on the x axis intervals for weight. The rules 
for choosing intervals stated on p. 6 should be followed here. Tabulation 
then consists first of finding on the y axis the interval in which an individ¬ 
ual s height falls and locating the interval on the x axis for his weight. A 
tally or dot is then placed in the cell formed by the intersection of these two 
intervals. The result of such a two-way or cross tabulation is referred to as 
a scatter diagram or correlation table. It will contain as many tallies as 
there are pairs of observations. The tallies in each row, or horizontal 
array, can be counted and recorded, separately by rows, to the right of the 
diagram. This procedure will, of course, yield the frequency distribution 
for all individuals with respect to the variable on the y axis. A similar 
count, and recording at the top, of tallies for each column, or vertical 
array, will yield the distribution for the other variable. The sum of the 
frequencies for either of these marginal distributions should equal N, or 
the number of pairs of observations. 

Figures 8.1 and 8.2 are illustrative scatter diagrams, but not models so 
far as number of grouping intervals is concerned. In practice, from 12 to 
20 intervals should be used in order to reduce the grouping error to a 
negligible amount. It is to be understood that the intervals in these charts 
are 40-44, 30-39, 50-59, etc. The student should study these diagrams so 
as to grasp some of the mechanical details involved in their construction. 

It should be noted that the number and size of the intervals for the two 
variables need not be the same, and that the zero points on the scales of 
measurement need not appear or even be indicated on the axes. 
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Fig. 8.1. Correlation scatter diagram for two tests. 

It can readily be seen that these two diagrams represent different 
degrees of relationship. A precise method for measunng “j^ed g 
degree of relationship or association or correlation wiU be discussed in 
detail in the pages to follow. We shall begin with a symbolic definition of a 
basic correlation coefficient, indicate its computation, and then discuss i 
™ niT nternretation, assumptions, and finally its limitations. Certain 
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elementary mathematical derivations will be either indicated or given 

whenever it ts thought that their inclusion will be useful in clarifyte a 
point or clinching an assumption. clarifying a 

The Pearson product moment correlation coefficient is defined by 


r = 2 x y 


NS„S„ 


( 8 . 1 ) 


in which * and y represent deviation measures from the respective means 
of the two variables, i.e„ x = X — M x and y = Y — M the 5s i^he 

thTrmteror^^'d^f afd deviations of the two distributions, and Nis 

M ^nd V hn.H f Tu measured - With ref ^nce to a scatter diagram, 
M and S hold for the marginal distribution at the top, whereas M and 5 

hold for the distribution to the right. The numerate Lm 4,implS 
that the product of each individual's a and y is determined, and that all 
such products are summed algebraically. There will, of course be iV 

p"hi ° f w “ wi " be ■*** ”0 

. for F “ *— 


r = 


NZXY-XXZY 


VlVSX 2 — (2X) VnST 2 - (2y) 2 


( 8 . 2 ) 


which involves four familiar sums, and the sum of the products of the 
paired raw scores. This formula is unwieldy for large TV and/or scores 
which are numerically large. For reasons which will become apparent 
ater the careful researcher will always make a scatter diagram, and once 

deviaf 5 b T d ° ne u ‘ W eC0110micaI t0 compute r in terms of step-interval 
deviations from arbitrary origins. An appropriate formula is 


JVZtte 




Vivzd 2 * - - (z,d v y 


(8-3) 


*Z h : Ch d6fined 38 a “ lndividual ’ s score deviation, in step intervals 

F “ale a ?he a 7 hT °u ^ XSCak ’ a " d ^ is defined similar ty the 
formu a n i f e " n ° t6 ® simiIarit y of the radical terms to 
r ( f C ° mpUtlng Formula ( 8 -3) calls for two sums two 

de^te Sq T reS ’ Md 3 SUm ° f Cr ° SS P roducts > a11 in terms of step or interval 
deviations from arbitrary origins. The arbitrary origins may be taken at 

handr" 1 ^ ° r u ^ b ° tt0m ° f 6aCh distribution - The former will involve 
handling smaller figures but will have the disadvantage of introducing 

n.g.,,,e numhem. The l„,e, scheme is he,,., if . c,,c„W„«h“f 
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CALCULATION OF r 

The computation of r will be illustrated for both hand and machine 
calculating methods. The hand calculation scheme here used may not be 
quite as economical as other available schemes, but the particular setup 
has the advantage that it forms an economical basis for machine com¬ 
putation, and the author presumes that practically all those who are apt 
to compute more than a few rs will have access to a calculating machine of 
the Monroe or Marchant or Friden type. Once the steps involved in the 
hand calculation form are grasped, it becomes easy to transfer them to 
machine work. The writer has never found the commercial correlation 
charts helpful. All that is necessary is a sheet of cross-section paper ruled 
four lines to the inch, on which we can readily lay out the axes, in intervals, 
for tabulating or tallying. When the scatter diagram has been made and 
the tally (or dot) marks have been summed across and up to get the mar¬ 
ginal frequencies (as shown in Figs. 8.1 and 8.2), the lvalues, taken from 
an arbitrary origin at the bottom-most interval for each variable, can be 
written, preferably with colored lead, alongside the marginal frequencies 
(see Table 8.1). The columns of fd and/d 2 values along each margin can 
be obtained by multiplying in exactly the same manner as was previously 
done for calculating the standard deviation. The sums of these columns 
provide four of the five sums needed for r. 

In order to obtain each individual’s d x must be multiplied by his 

d v , and all such products then summed. In the 140 interval on the y axis 
we find one individual whose score on the X variable falls in the 50 interval 
on the x axis. In terms of step deviations his d y value is 8 and his d x value 
is 5, and therefore 5 times 8, or 40, represents his d x d v product. Another 
individual with the same d y value has a d x value of 6, whence 6 times 8 is his 
contribution to Y,d x d y . The third individual in the 140 interval has a d x 
value of 7, whence 7 times 8 is his product. These three individuals 
contribute 5 x 8 + 6 x 8 + 7 x 8, or 144, to the sum of products. The 
d y value of 8 is a common factor to these three products, whence 
8(5 + 6 + 7) or 8 x 18 yields 144. This suggests a scheme, for computing 
the d x d y sum, which involves first summing the d x values for a particular Y 
interval or array and then multiplying this sum by the d y value. Thus the 
d x values of the individuals in the 130 interval sum to 34, and in the 120 
interval to 34, and so on down to the 60 interval, which yields 2 as the sum 
of the d x values. The determination of these d x sums is greatly facilitated 
by the use of a runner on which the d x values 0, 1, 2, 3, • • • , have been 
labeled to correspond exactly with the deviations in step intervals alongside 
the marginal distribution at the top of the diagram. Since each of these 
d x sums is to be multiplied by a d y value and then all the products summed, 
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Table 8.1.* Computation of r 
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r = (61)(1097) - (224)(253) 

V(61)(1012) — (224) 2 V(61)(1297) - (253) 2 


Space limitations account for the use of too few intervals in this table. A complete 
labeling of intervals would be 25-29, etc., and 60-69, etc. 


it is convenient first to record the d x sums to the right as a separate column 
and then to multiply each d x sum by the corresponding d v value, thus 
leading to the last column of figures. Before these final multiplications are 
made, the column of d x sums should be added to see whether it agrees with 
the already computed from the marginal distribution of X scores. 
Thus an internal check is provided for the column of d x sums; all other 
computations should be done twice in order to insure accuracy. 

When a calculator is available, the work sheet need not include the fd 
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and fd 2 columns, since the sums of these two columns can readily be 
obtained by the method discussed on pp. 22-23. This means that the 
column of d x sums can be placed alongside the d y values; then each d x 
sum can be multiplied by the juxtaposed d y value, with the products 
allowed to accumulate in the dial as the needed ^d x d y . Thus the right-hand 
column figures need not appear on the work sheet. 

The substitution of the five sums into formula (8.3) is straightforward. 
The denominator factors are evaluated as explained on p. 23, and the 
numerator is obtained by punching ^d x d y into the keyboard and multi¬ 
plying by N; then, with the product left in the lower dial, Hd x is subtracted 
times. If needed, the two means can be obtained by substituting Hd x 
and Hd y into (3.3), and the two standard deviations by multiplying the 
proper radical by the interval size and dividing by N [equivalent to 
substituting the sum and sum of squares into (3.5)]. 


Chapter 9 

CORRELATION: 
INTERPRETATIONS 
AND ASSUMPTIONS 


Intelligent use of the correlation coefficient and critical understanding 
of its use by others are impossible without knowledge of its properties. It 
is not sufficient that we be able merely to recognize r as a measure of 
relationship. It is a peculiar kind of measure which permits certain 
interpretations provided certain assumptions are tenable and provided we 
consider possible disturbing factors. Since the interpretations of r are so 
closely related to assumptions, no attempt will be made to present a 
separate discussion of these two aspects. The factors which affect r, and 
which are therefore limitations additional to assumptions, will be discussed 
in Chapter 10. 

STUDY OF SCATTERGRAM 

We shall begin by making a somewhat detailed study of certain pro¬ 
perties of a typical scatter diagram. The columns and rows of the diagram 
have already been referred to as vertical and horizontal arrays , the inter¬ 
section of two arrays has been called a cell , and the meaning of the 
marginal distributions has been given. If the scatter diagram depicted in 
Table 9.1 is examined, it will be noted that each vertical (and also each 
horizontal) array contains a frequency distribution, and that the marginal 
totals really represent the number of cases in these array distributions. 
These array distributions are very much like any other typical distribution: 
bell-shaped with a clustering or scattering about a central value. The mean 
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Table 9.1. Correlation table for height of fathers (X) and height of sons (T) 
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and standard deviation again become useful descriptive terms. Thus, 
in Table 9.1, the mean height of sons whose fathers were 64 inches tall is 
found to be 66.8 inches. This is simply the mean of the twelve cases which 
fall in this particular array. Similarly for all the vertical arrays we have the 
means as recorded along the bottom of Table 9.1. The means of the 
horizontal array distributions have been recorded to the right of the 
scatter diagram. For example, the mean height of the 10 fathers whose 
sons were 72 inches tall is 70.0 inches. 

If the means of the vertical arrays are plotted (see crosses in Fig. 9.1) 
two things will be noticed: the means are progressively greater as we pass 
from short to tall fathers, and they fall approximately on a straight line. 
It will be noted (see dots in Fig. 9.1) that the means for the horizontal 
arrays also approximate a line and show progression. Now, with reference 
to the means of the vertical arrays, each represents the mean height of 
sons of fathers of a particular height and therefore may be used as a basis 
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62 63 64 65 66 67 68 69 70 71 72 73 


Fig. 9 . 1 . Plot of array means for data of Table 9.1. 

for predicting the height, if unknown, of a man if we have been told the 
height of his father. Thus, if the father is 66 inches tall, the best estimate 
of his son’s height is 67.6, the observed mean height of men whose fathers 
are 66 inches in height. 

Obviously such an estimation would be subject to considerable error, 
since we have also the observable fact that the heights of sons of fathers 
66 inches tall show a large amount of variation about the array average. 
This variation tells us something about the possible magnitude of the error 
involved in using 67.6, the array mean, as our estimated value. The 
unknown height, of which we take an array mean as an estimate, may 
actually fall anywhere within rather wide limits on either side of the array 
mean. These limits can be described in terms of the standard deviation of 
the array distribution. The standard deviation for the distribution of 
heights of sons whose fathers were 66 inches in height is about 2.1. Now, 
if we take 67.6 as the best estimate, we can say that, if we were to predict 
the height of 100 sons (fathers 66 inches), about 68 per cent of the time the 
error would be within the limits 67,6 ± 2.1, 95 per cent within 67.6 ± 4.2, 
and nearly always within the limits 67.6 ± 6.3. Likewise, when the Ss for 
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the several arrays have been computed, a statement of the limits of the 
error in predicting any son’s height from his father’s height can be made. 
Such a procedure will yield as many measures of error as there are vertical 
arrays. We shall soon see that a convenient assumption can be made which 
will usually allow us to use a single indication of the error of estimate. 

Let us return again to the line of the means. Two such lines have been 
drawn in Fig. 9.1; one line “fits” the means of the vertical, the other the 
means of the horizontal, arrays. Let us for the present confine our 
attention to the means of the vertical arrays. They do not lie exactly on the 
drawn line; some are above, some below. If they fell exactly on the line, a 
prediction based on an array mean would be precisely the same as a 
prediction obtained by noting the Y value of the line where it cuts the 
middle of the array. Furthermore, if the means were exactly on a straight 
line, we might write the equation for this line in the form Y = BX + A, 
where A equals the y intercept (value of Y where the line crosses the y axis) 
and B equals the slope of the line (the inclination of the line to the x axis). 
With A and B known, the value of Y for a particular X can be readily 
estimated. 

But, since the means do not lie exactly on a straight line, the foregoing 
reasoning would not seem offhand to yield us anything of practical value. 
From many points of view, however, it is desirable that we determine the 
equation of the straight line which best “fits” the means, i.e., the equation 
of a line which passes near all the means. Then we can use this equation 
instead of the array means in making predictions. The justification for this 
procedure depends on the validity or tenability of an assumption: we 
assume that the failure of the means to fall exactly on a straight line is due 
to chance fluctuations in the means. Each array mean is based on a sample 
and consequently deviates more or less from the true or population value 
of the mean for the array. This is equivalent to saying that, if all the array 
means were based on a much larger number of cases, we could assume that 
they would approximate more exactly a straight line. This is an assumption 
which can always be made provided the array means for a particular 
scatter do not show marked deviations from linear form. (Adequate 
checks in terms of probability, to be described later, can be utilized to 
ascertain whether the fluctuations from linearity are larger than is reason¬ 
able on the basis of chance.) 

THE BEST-FIT LINE 

We can now consider one of the advantages of using a line instead of 
the several array means as a basis for prediction. The location of the line is 
dependent on all the means, or rather on all the cases. It therefore seems 
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reasonable to believe that the line would be more stable from the sampling 
point of view than would the array means, each of which is based on a 
rather small number of cases. 

If we accept the assumption of linearity of array means, our problem is 
that of determining A and B so that we can write the equation of the line of 
means. We need the equations of two lines: Y = BX + A for the means 
of the vertical arrays and X = B' Y A' for the horizontal array means. 
We shall consider the determination of the constants A and B for the first 
equation, but before doing so something must be said concerning what is 
meant by a “best-fit” line. The constant A gives the y intercept, i.e., 
tells us where the line cuts the y axis. Suppose we think of several possible 
lines having the same slope (the same B) as the line in Fig. 9.1 which passes 
near the crosses. Obviously, if we considered a line passing near the top or 
bottom of the scatter diagram, it would be a “worse fit” than that drawn in 
Fig. 9.1. Likewise, if we think of pivoting the line about some point, 
thereby altering its slope, it can be readily seen that rotating it to a vertical 
or horizontal position would give a worse fit. It should now be clear that 
the assigning of some values to A and B will lead to a worse fit than that 
obtained by certain other values, or conversely that some values will yield 
a much better fit than others. 

One criterion accepted as a basis for a best-fit line is that the sum of the 
squares of the deviations from the line shall be as small as possible. With 
respect to determining the best-fit line to the means of the vertical arrays, 
this criterion or definition of fit implies that the values of A and B are to be 
such that the sum of the squared deviations of the observed heights of 
sons—deviations in an up and down or vertical direction—about the 
line will be a minimum. Stated in symbols, let Y f = BX + A, where Y' 
(read Y prime) is the value estimated from a given X, and let Y be the 
observed value. Then ( Y — Y') 2 represents the squared deviation of any 
Y from the line or estimated value. The problem is so to choose A and B 
as to make S(F — Y') 2 as small as possible. It is more convenient to deal 
with both the equation, y f = bx + a, and the sum, S(y — y') 2 , in deviation 
units, with y' and y as deviations from M y and x = X — M x . This is 
merely the translation of the axes which makes the origin or reference 
point coincide with M x and M y . The student should visualize the meaning 
of this shift of axes. Note that the pattern of tallies is not changed by this 
simple transformation. Do you think that the slope B will equal the slope 
bl Will A = al Let us keep the first question in abeyance and examine 
now the second question. Both A and a represent the y intercepts of the 
desired prediction line. If it is not immediately obvious to the student that 
A may not equal a , he should imagine that in Table 9.1 and Fig. 9.1 the 
axes have been moved so that the origin is at the center of the scatter 
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diagram, and then ask himself where the line through the means of the 
vertical arrays would cut the new y axis. (Incidentally, it should be noted 
that the value of A cannot be read directly from Fig. 9.1 for the simple 
reason that the reference frame as drawn does not include the origin. The 
real y and x axes of the original measures would be, respectively, to the left 
of, and lower than, the indicated axes.) 

It is of interest to speculate concerning the value of a in the equation 
= bx + a. Common sense would suggest that, if an individual were 
average on X, the best guess would be that he would be average on Y. 
That is, if X = M x , we would expect Y' to equal M„. But, if an individual s 
X measure fell at M x , his deviation, or * value, would be 0, and the esti¬ 
mated value of Y as being equal to M v would in terms of deviation scores 
become 0. This would imply that the prediction line would pass through 
the origin of the deviation score reference axes, and consequently that the 
y intercept would be zero; hence a = 0. For the purpose of simplifying 
the determination of the best value for b, we ask the reader to accept, on 
the basis of the foregoing reasoning, that a — 0 for the best-fitting line. If 
we carried both a and b along in the following development, a would in 

fact turn out to be zero. ... 

This permits us to write y' = bx as the equation for estimating y, in 
deviation units, from x, or deviation values of X. Our task becomes that of 
determining the value of b which will make 2(y - y'f a minimum. 
Incidentally, it should be obvious that the discrepancy of any particular y 
value from the desired line has the same numerical value as the deviation 
of its corresponding original Y value from the line, and that Yi(y y) 

_ 2(F— Y’f. When we have determined the optimal value for h in 
y' s bx, we can readily pass back to the original reference frames, the 
gross score axes, by substituting for y' the value Y' - M v , and for x, 
X — M x . With a fixed as zero, i.e., with the y intercept equal to zero, we 
can think of the line as passing through the origin (deviation axes) ; i.e., 
its up and down location is fixed. Obviously, many lines could be drawn 
through the origin, and they would differ only as to slope, i.e., as to b. 
Of all possible lines which may be drawn through the origin, some will be 
closer than others to the observations (tallies) in <oto. The student might 
imagine several lines any of which would seem to constitute a good fit. As 
he takes lines with either greater or lesser slope than those of apparently 
good fit the fits will become worse; and of those which seem to fit, some 
will actually be better than others. The student might think that it would 
only be necessary to draw what seems by inspection to be the best-fitting 
line, and then obtain its slope by actually measuring the angle which it 
makes with the horizontal (with needed adjustment to allow for the 
measurement units). The trouble with this procedure is that individuals 



122 PSYCHOLOGICAL STATISTICS 

akfwh tend t0 dlSa S ree re S ardin g which of several lines was really best- 
also, the measurement of angles would be none too exact. What w^ need 

iLh t ° b rr pr0cedure ’ a method th at will yield the value of b which 
leads to the best possible fit in the sense of reducing the sum of the squares 
of the discrepancies to a minimum. sum or me squares 

We set up the function 

t _ Z(y - y'f _ 2 ( 1 / - bxf 
N TV 

in which we have N deviations of the form y - y’ or y - bx fsince 
V bx). These deviations when squared, summed, and divided by N eive 

™„ f Unctlon which IS lo the ~ i" 8 ” 

This ' 7 “ e u° be . aSS1S “ ed *° b Can best be ascertained by the calculus * 
i 1°^ by takmg tbe derivative of ‘he function with respect to 
setting this derivative equal to zero, and then solving for b. Thus 

4f_ _ -lLx(y - bx) 

db n 

which, set equal to zero and divided by -2, gives 

2 x(y - bx) „ 


2,xy — bZx' 


N N 

I h if brs ‘ or ( , cross -Prodaet term involves the correlation coefficient as 

J^yrS ^Td*^’ ?TAr Whi °2 definid0n f ° rmUla We See that 

- 'AA,, and since Xa?/N = S 2 X , we have 
rS x S„ - bS\ = 0 


which gives 


rS v — bS x = 0 


foIlowing^erbation'on^ai'th^r^'skep'dca^will'diW* fim p3rt ° f the 

self that no magic is involved tere P ’ S 3 Ca ' CuluS teXt to satisf y him ‘ 
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as the optimal value for b. We therefore have 

y' — r — % 

S„ 


123 


(9.1) 


as the equation for the best-fit line. This equation is in terms of deviation 

measures, and by proper substitution we get 


Y* — M y = r -2- (X — M x ) 


or 


y' = r — X + 
S_ 


(m v - 


r — M 


(9.2) 


as the equation in terms of the original or gross scores This is the form 

which we would use in predicting Y from X. Note that B * * 

is the slope of this line and that the constant A is equal to the parentheses 

By similar reasoning the equation of the best-fit line to the meanst 
the horizontal arrays is found to be 

(? 3 ) 


which becomes 


X' 


= r — y + (m, — r ~ w) 
S„ v S, ! 


(9.4) 


When both equations are written in the B and A notation, we may attach 
subscripts to differentiate between the Bs and between the ^s: 


Y’ = B„X+A 
X' = B xv Y+A 


Regression. Equations (9.1) and (9.3) in deviation score form and 
(9 2) and (9.4) in gross score form are known as regression equations and 
thi? constants denoting slope are known as regression 
assumed that prediction will be as accurate by means of ^ re ^” 
equation as by way of array means, and it can readily be seen that by g 
aggression equation we can predict from intermedmte values, e.g . 64,. 
This S is of especial advantage with grouped data: the array mean is asso 
ated only with the midpoint value of the grouping interval, whereas the 
regression line is not so limited since it is continuous. 

t More strictly speaking, we are fitting a line to means weighted according to their 
respective IVs; i.e., we are fitting a line to the observations. 




1Z4 

PSYCHOLOGICAL STATISTICS 

l‘„ h “ '2ti,°e g t d ™" “ ke “ *“ 

standard deviations, enables Js to writf^ °. means and ths two 
variable can be predicted Lm l ot^rT^ ^ " dthor 
indicate the rate of change— unit nf h ' . he regression coefficients 
change in the other-and in \u g£ “ ° ne VariabIe P er unit of 
itself indicates the rate of change 6 mif^h^ deVlations are ec l ual > r 

”,h ‘ " 8 “' ^ Pr0P " “« 

y> I 'fJi t ll ' 24 (t0 estimate son’s height) 

X - .6° Y+ 26.63 (to estimate father’s height) 

The student should study Fig 9 J sufficientiv t r, 

is the slope of the line cassinf nP t0 convince himself that .52 

Sir- 

accuracy of prediction 

pr^rttp 6 ]! by meanTofa re^Ereit^ »™«, »f 

that, when the mean of an array h^sed 31 ' 0 " a’ ^ alread y been indicated 
is a function of the spread within that a ICtl0n ’ the error of estimate 

it becomes possible to substitute n r lntr °ducing an assumption 

numerically different arS stand rd 38 ^ ° f '^ in P ,ace ofthe several, 
array distributions in’ Table 9 1 rev^l d ®^ atl ° ns ' An examination of the 
each other very httle il dil r L r t ^ ^ T™ 1 ^ difFer from 
were to compute the standard de ■ 6W1 * e ’ tbe horizontal arrays). If we 

fad ddterene^ft S dLrJrorT,’ a" ” "Cd 

baW to chance or ..mplinXcTn'.l ” " “ U “ 

much larger At, the arrav dkners . f'Z W ® aSSUme that ’ if we had a 

narily this assumption of homoscedlltJcTy can'Umet^ndon 9 ^' 

h'rori arrays). USed ** 3 

an average of the 

somewhat laborious job. Since we are to use 
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array means, as a basis for prediction, we really need something corre¬ 
sponding to the S about this line. Such a value can be obtained by noting 
that v — y' (or Y — Y’) represents the discrepancy between estimated and 
observed values and that Y(y - 2/') 2 /^ is the mean of the squared devia¬ 
tions the root of which will be the standard deviation of the discrepancies 
between estimated and observed values. This will be taken as the one 
standard deviation to replace the several standard deviations as our 
measure of the error of prediction. This particular standard deviation, 
defined as the square root of Y(y - y’flN, is called the standard error of 
estimate. It may be determined in two ways. First we can take a round¬ 
about way which involves these steps: the prediction of each Y by use ot 
equation (9.2), or each y by use of (9.1); the calculation of the discrepancies 
(Y - Y') or (y - «/'); squaring, summing, dividing by N, and taking the 
square root. A quicker method for determining the standard error of 

estimate is readily derived algebraically. 

Let S y . x stand for the standard error of Y as estimated from X, then by 

definition, 

S( Y- Y'f ^ S(y - y r f 
N 


S 2 = 


N 


but 


y' = r — X 

S. 


by formula (9.1) whence 

S 2 =is(y-r-*) 

n N V S„ / 


then 


By a 


= i £ (i/ 2 — 2r — a;: 

N V S x 


1! 

2r^ 



1! 

co 

! 

2r — 

rS x S,j 

1 r 2 S2 

+ r s 2 /* 

— S 2 — 

— ^ y 

r 2 S\ 



II 

Co 

~ 2 




similar line of reasoning it can be shown that 


S x . y = S X V 1 - r 2 

which gives the standard error of X as estimated from Y. 


(9.5) 

(9.6) 
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Thus the correlation coefficient not only enters into the prediction 
equations (9 1) to (9.4), but also permits us to gauge the accuracy of 
prediction. It should be noted in passing that we can write the equation 
of a best-fit line without first determining r and that the error of prediction 
can also be ascertained without recourse to r. Such a method for deter- 
minmg theerror of estimate has already been indicated: the square root of 
HY- Y)JN, in which Y - }" represents the computed discrepancy 
between observed and predicted values. This need not involve r unless 
the prediction equation is written in terms of r, as was done in (9.2). The 
equation Y= A + BX can be written in the form 

Y > _ SX a ST - SxS XY NhXY - I.XY .Y 

MSX 2 - (2X) 2 1VSX 2 - (SX) 2 ^ ( - 9 ' 7 ' ) 

m which Xand T stand for gross or original measures. Formula (9.7) for 
the best-fitting line (least squares solution) does not involve means, Ss or 
the correlation coefficient. If, as is frequently the case, we are interested in 
obtaining the equation for Y only, it will be noticed that it is unnecessary 
o compute the sum of the Y squares, which is not, however, a tremendous 
saving of time. Perhaps the quickest way for determining the equation is 
by direct substitution in (9.7), but the determination of the error of 
estimate (sometimes called the closeness of fit of the line) is certainly 
facilitated by calculating r and S y and substituting in (9 5) 

The standard error of estimate is to be interpreted as a standard devia¬ 
tion, and in so doing we are tacitly assuming that the array distributions 
are not only equal in dispersion but also normal. For the correlation 

Table 9J ; ^ haVe S »- = L9 ’ which is t0 be considered the 
standard deviation of the Y values about the regression line, Y' = .52X 

+ 33 24. By use of this equation we would predict that the height of the 
son of a man 70 inches tall (X = 70) would be 69.6, and the error of esti- 
n "h- ’ interpreted by saying that, if we made many such 

predictions 68.26 times out of a hundred the actual height of sons of 
70-inch fathers would be within the limits 69.6 ± 1.9, and nearly always 
within the limits 69.6 ± 3(1.9). y ^ 

This is a second method for interpreting the correlation coefficient: in 
terms of the accuracy of prediction or closeness of fit of regression lines. 

It no correlation exists, the errors of estimate are S y . x = S and S = s 
In this connection it can be seen from formulas (9.2) and (9-4) that when 
r = 0, the estimated Y, T, becomes M v and X' becomes M x . For example, 
i i as een established that the correlation between toe length and 10 is 
zero we would always take 100 (the mean) as our best guess for an 
individual s IQ regardless of toe length. The error of estimate would of 
course be the standard deviation of the distribution of IQs, and it would be 
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said that toe length is useless in predicting IQ. The scatter diagram for 
IQ as Y and toe length as X would exhibit the following characteristics: 
first, the regression line Y' = A + BX would be horizontal, i.e., B would 
equal zero, and the means of the arrays would fluctuate about the value 
M or A would equal M y \ and, second, all the array distributions would 
have dispersions approximately equal to S y . What would be the best guess 
as to the other regression line and the standard deviations of the horizontal 

^Now suppose the correlation between the variables were perfect (r = +1 
or __ 1 ). The tallies in the scatter diagram would lie in a line, there would 
be no spreading about this line, the two regression lines would coincide, 
and no error would be involved in estimating X from Y or Y from X. 
That S y . x and S x . y would both be zero in case of perfect correlation is quite 

evident when we consider formulas (9.5) and (9.6). 

At this point the student should note the difference between positive 
and negative correlation. In the case of a positive r, a high score goes with 
high and low with low, whereas, for a negative r, high goes with low and low 
with high. With reference to the scatter diagram, a negative r typically 
involves a swarm of tallies stretching from the upper-left to the lower-right 
corner whereas for a positive r the trend is from lower left to upper right 
(this assumes that the axes have been laid off in the conventional fashion). 
With reference to the regression equations, a negative r yields negative 
regression coefficients or negative slope for the lines. The student should 
be warned that an apparently negative r may in reality be positive. Thus, 
if one variable is a test or performance scored in terms of time (or errors) 
and the other variable is scored in terms of amount done, the scatter 
diagram might show large time scores as going with small amounts of 
work done, i.e., high with low, which might be wrongly taken to indicate 
negative rather than positive correlation. Instead of asking whether high 
goes with high and low with low, it is safer to ask whether best goes with 
best. This rule, however, is difficult to apply when we are dealing with the 
interrelation of personality traits, especially those which do not readily 
permit of a statement as to which is the desirable end of the trait scale The 
sign of the correlation coefficient in such cases always needs a qualifying 
statement which explicitly tells the direction of the relationship between the 
variables. Obviously, as far as accuracy of prediction is concerned, the 
error is the same for a negative and positive r of the same magnitude. 

Alienation. To return to the interpretation of the correlation co¬ 
efficient by way of the standard error of esti mate, we see that the factor m 
formulas (9.5) and (9.6) which involves r is Vl - r 2 . It is the value of this 
which, when multiplied by the proper S, leads to the error of estimate. T e 
expression Vl — r* is called the coefficient of alienation. If r is zero, its 
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value is 1 and the error of estimate is the S for the variable being estimated. 
Table 9.2 gives the value of the coefficient of alienation for varying values 

ofr. The student will do well to fix in mind the trend in this table. It will 

be noted that, compared to a correlation of zero, an r of .60 reduces the 
error of estimate by 20 per cent, whereas an r of .30 reduces it by about 5 
per cent; that r must be as high as .866 before the error of estimate is 
reduced by one-half; and that the difference in reduction between an r of 
.70 and an r of .90 is approximately the same as that between .20 and .70. 


Table 9.2. 

Values of the 

coefficient 

of alienation 

r 

Vl - r 2 

r 

Vl - r 2 

.00 

1.000 

.60 

.800 

.10 

.995 

.70 

.714 

.20 

.980 

.80 

.600 

.30 

.954 

.866 

.500 

.40 

.917 

.90 

.436 

.50 

.866 

.95 

.312 


This interpretation of r is most useful and at the same time most disturbing, 
since the errors of estimate for rs in the vicinity of .40 to .70, values usually 
found and utilized in predicting success from test results, are discouragingly 
large. 

A somewhat different way of grasping the meaning ofr, as it is applied to 
accuracy of prediction, is to square both sides of formula (9.5) and then 
solve explicitly for r. This leads to 


from which it is readily seen that the correlation coefficient depends on the 
accuracy of prediction relative to the total variance of the variable being 
predicted. & 

It might be well at this time to bring together a few remarks concerning 
the assumptions involved in using and interpreting a correlation coefficient 
in terms of either rate of change or accuracy of prediction. When an r is 
reported, and no evidence to the contrary is given, we have a right to 
expect that the assumptions of linearity of regression and homoscedasticity 
have been met. The interpretation of r as rate of change definitely assumes 
linearity, and the interpretation in terms of the error of estimate definitely 
assumes both linearity and homoscedasticity. In certain special cases 
where the investigator is interested only in a one-way prediction, say Y 
from X, and there is no likelihood of ever reversing to predict A from Y, it 
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will suffice if the regression of Y on X, i.e., for predicting Y from X, be 
linear and the Y or vertical array distributions be homoscedastic. The use 
of the correlation coefficient in predicting performance from age ™y.be 
cited as an instance in which there is no need to worry about the possible 
nonlinear regression of age on score or the lack of homoscedasticity abo 

^Althoughffiere are adequate checks for linearity and homoscedasticity, 
, Simeon of the setter diagram i, usn.ll, 

us of violent departures from these assumptions. Formula^ and 
nonplotting schemes for computing r give no inkling as to whe * er ^ 
assumptions are being violated and therefore cannot command the confi 
dence of the careful investigator. The purpose of a research projectunight 
very well be the study of the relationship between two variables, but an 
encf result in terms of a correlation coefficient, with no attention given 
tViP form of the relationship, is inadequate. 


VARIANCE AND CORRELATION 


A third method of interpreting r is in terms of variance. Before discussing 
Ate in erpmtetion, we must introduce an important theorem concerning 
« »«(or difference). Suppose that »ri.ble W» n„de «p 
of two parts V and V such that W = U + V. For example e sc 
an arithmetic test might consist of two parts: score inaddutiotJ ^score 
in multiplication. Obviously, w = u + v, and therefore t 
the W variable is 

* *“ N 

= -£(« + vf 


= - (£u 2 + Su 2 + 2Suu) 

N 

= S 2 „ + S\ + 2 r a „SA 
and in case V and V are independent, we have 

+ S \- 

If we are dealing with the difference, W = U - V, we have 
S 2 W = S 2 „ + S 2 „ — 2r„ v S u S v 

and for U and V independent, we have 

s 2 m = S\ + 


(9.9) 

(9.10) 

(9.11) 
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which is identical with (9.10). In words, the variance of a sum (or difference) 
of two independent variables is equal to the sum of their separate variances 

ariances are additive, whereas standard deviations are not. It can be 
s own that, when V and Fare distributed normally, their sum or difference 
will also yield a normal distribution. 

Now, with regard to the third method for interpreting r, let us note that 
m deviation units an observed y can be thought of a! made up of two 
independent parts, the part which can be predicted from *, namely y' 
and the residual or unpredictable part, (y - y’). Before going further we 
must demonstrate that y' and (y - y') are really independent. The 
numerator for the correlation between y' and (y - y') can be expressed as 

mV ~ y>) ' But ’ Since y'^r^x and (y - y') = y _ r £ we have 


' s„ / 


r-?Zxy - r 2 — 1 

S. S* 


s 


- NrS„S. 


02 

■ 2 -” NS* 


' .V 2 , 

which is seen to be zero; hence y' and (y - if) are uncorrelated. 

We have y -y + (y - y')- whence, by the foregoing variance theorem, 

+ S\. x (9.12) 

” iS the “f 0f the reSiduaIs > to - rt- If we divide both 

sides of this equation by S 2 P , we get 


i = 


sV , 
si, si 


(9.13) 

from which we see that, since the two ratios “add to unity, either one can be 
interpreted as a proportion (or a percentage by shifting the decimal 
point). Thus the ratio of S*, to S\ is the proportion of the variance in Y 
which can be predicted from X, and the ratio of S 2 to S 2 represents the 
proportion of the variation (variance) of Y which is left over or remains or 
cannot be predicted from X. A little reflection as to the meaning of this 
residual variance should convince the student that we are here dealing 
with the same variance which results if we square formula (9.5), thuf 

- r*) 


Si 


= 1 


which means that 
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When we substitute this value into (9.13), we have 
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= 

S 2 


+ 1 - r 8 


from which it is readily seen that the ratio 


s* 


(9.14) 


That is the square of the correlation coefficient gives the proportion of he 
of 7 which is predictable from X, or r 2 measures the 
total fiance variance which can be attributed to variation in X. 

^proportion of the variance of Y which is due to variables other than 
X is given by 1 - r 2 . By shifting decimals, we can think of r as indicating 
Misgiven by / of variance which has been explained, and 

1 p 7a"s le due to other causes. It will be noted 

hlt ^ nit ? an be g so interpreted. This is true because variance, are 
additive whereas standard delations are not. It should be emphasized 
that r 2 as a proportion has to do with variation expressed technically as 

V tTo e f some interest to examine the meaning of It« the square of 
the standard deviation of the estimated values, and, with reference to the 
scatter diagram S . corresponds approximately to what we would obtain 
if w“ were fo comp’ute the standard deviation about M y of the vertical array 
means, each weighted according to the £,her 

'ThifSdTr’nT^.i.g a correlation cocBcien, acnrcr 

linearity of the regression line involved inpredictrng Y - » > on 

variable from 7 as the independent variable; i.e., the regression ot 
7mu b "near. If X were considered as the dependent variable the 
interpretation that r 2 indicates the proportion of the vanance opT' ex¬ 
plained by 7 would assume linearity for the regression of X on 7. The 

Sumption of iinci.y b^ome. «plici. if t ts proved due «, / v 

- ,202 and it was implied when we used S 2 ,., m that this resiamt 

f, ii'i <L„ .b»«. ■ straight lino Thi. “ 

assume homoscedasticity, nor does it assume normality either for the 

causes wtiSer the interpretation of the correlation coefficient m terms 
of variance. P The problem is frequently one in which an attempt is made to 
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con., ? f ? m ° ne trait in terms of variati °n another which is 
conceived of as being more basic. The use of r 2 as the percentage of the 

variance of a trait which is predictable by, or attributable to, variation in a 
second variable becomes a valuable tool in the analysis of variation Of 

ZL ",”"’ 1 “ “ ,un1 " 8 “»»■ «f - v.“bt" 

another. Logic, not statistical method, must be invoked to determine 

mod a ““a 31 , reiatl0nshi P exists - and the statistical interpretation 
modified accordingly. Variation in V might cause variation in Y, or vice 
v sa or variation m both Vand Y might be due to the influence of some 
other variable or variables. 

To illustrate the interpretation of r 2 as a percentage, let us suppose we 

CvT d 6 “““ ° f a gr ° Up 0f schocl children on a substitution test 

Considerable variation in scores will be present, and we may rightSy 

deL a P ° rt ! 0n ° f thiS Variati0n is due t0 a « e differences. We can 

determine the correlation between age and performance. Suppose/- = 60- 

this can be interpreted by saying that 36 per cent of the variance in 
performance is due to age differences, and 64 per cent is due to other 
causes. Likewise, the variance in crop yield due to variation in rainfall 

be n ana e ivzed r Ot lne t d: ^ ^ VarianC6 “ height ° f 3 grou P of men ma y 

be analyzed into two or more parts, one of which might be the portion due 

to variation in the heights of their fathers. 

CORRELATION AND COMMON ELEMENTS 

fli {achif P ?h Sib / e inter P r u e , tati0n of the correlation coefficient assumes 

number If n h n tw ° vanables can be ‘bought of as a summation of a 
number of equally potent, equally likely, independent elements, which can 

the n, er I >reSe " t °. r absent Then ‘he degree of correlation is a function of 

formffia”i ° f m6ntS comraon t0 ‘he two variables. The general 


r = 

XV 


'J n x + » C -Jn v + n c 


(9.15) 


in which n equals the number of elements unique to X, n v the number 
unique to Y, and n c the number common to both variables. If the number 
of elements in -V equals the number in Y, r gives the proportion of elements 
common to and Y; if Vis determined only by elements common to Y 

Iteri™ i elements > r2 « ives ‘he proportion of elements 

entering into Y which determine V. There is little, if any, factual basis 

for believing that the assumptions stated are tenable so far as psychological 
variables are concerned, and therefore the interpretation of the correlation 
coefficient m terms of common elements may be viewed with scepticism 
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NORMAL CORRELATION 

A fifth interpretation of r is more mathematical but of little practical 
value. We have already seen how a frequency distribution and its polygon 
can be thought of as smooth, conforming perhaps to the equation of the 
normal curve. A correlation table is a frequency distribution, a picture or 
graph of which requires a third dimension. If we were to replace each tally 
in a scatter diagram by a thin block, there would result something ana¬ 
logous to the histogram except that it would be three dimensional—the 
heights of the stacks of blocks would indicate the frequencies for the 
various cells. Now suppose that this mound of blocks is by some method 
smoothed to a surface, and we consider the total volume under the surface 
(between the surface and the XY plane) as representing N. Then the 
number of cases falling between two given X values and simultaneously 
between two given Y values will be approximately the volume of that 
portion of the mound which has as its base the rectangle or square formed 
by the intersections of the two X and two Y values. If the regression lines 
are linear, if the array distributions are normal and homoscedastic, and if 
the marginal distributions are normal, the resulting surface is termed the 
normal correlation surface , and the equation of the surface can be written 
as 

N -_ 1 ... (v 2 2 rxy\ 

w ==-==. e 2(i-rbV* + a 2 , <, X<T J ( 9il6 ) 

2rrO x O y f 1 - r 2 

A number of important properties of the normal correlation Surface can 
be deduced from this equation and its integral. For instance, the standard 
error of estimate can be derived from formula (9.16), and it can also be 
shown that the contour lines which represent different altitudes on the 
mound, i.e., different frequencies, will be concentric ellipses, and that if 
r = 0, the contour lines will become concentric circles. If the equation is 
written with N equal to unity, by double integration the probability of an 
individual s falling between two particular Y values and between two X 
values can be determined. Tables are available which can be utilized for 
this purpose. J 

LIMITS FOR r 

Attention is called to the fact that definition formula (8.1) becomes 
r — Yz x zjN , when written in terms of standard scores for both variables. 
This indicates specifically that the correlation coefficient is a statistical 

$ Pearson, Karl, Tables for statisticians and biometricians, part II, Cambridge* Cam¬ 
bridge University Press, 1931. See Tables 8 and 9. 
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average, the average of the cross products of standard scores. Suppose that 
we ask what happens when the correlation is perfect in the sense that each 
individual’s z x score equals his z y score. If this is true, the sum Y*z x z y would 
be the same as Ez 2 , which when divided by N gives 1.00. Thus the upper 
limit for r is +1.00. Now suppose a perfect inverse relationship, such that 
an individual’s z^ and z y are the same except for sign, one being positive 
whereas the other is negative. If this holds true for all the cases, the sum 
'Lz x z y can be written as Sz(— z) or — 2z 2 , which when divided by N gives 
— 1.00 as the limit for perfect negative correlation. 

As exercises, the student should show that multiplying or dividing 
either X or Y or both by a constant, or X by one constant and Y by another, 
will not change r, and that adding or subtracting a constant does not affect 
the value of r. 

SUMMARY 

The five suggested methods for interpreting the correlation coefficient 
may be briefly summarized here. 

1. r is associated with the rate at which one variable changes with 
another. This assumes that the regression line so interpreted is linear. 

2. r tells us how accurately we can predict by a regression equation. 
The standard error of estimate permits one to infer the possible magnitude 
of the prediction error, whereas the coefficient of alienation indicates the 
reduction in error over that error which would exist if there were no 
correlation. This interpretation assumes that the regression line used in 
predicting is linear and that variation about this line is normal and homos- 
cedastic. 

3. r 2 gives the proportion of variance in Y predictable from, or attribut¬ 
able to, variation in X. This assumes linearity for the regression of Y on X 
and requires caution in assuming the direction of cause and effect. 

The student should attempt to visualize the meaning of these three 
principal methods of interpreting correlation. In particular, he should 
note the meaning of S y9 S y > 9 and S y . x (or their counterparts with the sub¬ 
scripts y and x interchanged). The first, S y , holds for the marginal distri¬ 
bution of all Ys; S y > pertains to the variability of all Y values as predicted 
from X; the third, S y . x is a measure of the variation about the regression 
line for predicting Y from X. 

For none of these three interpretations of r do we have to assume normal 
distributions on the margins. However, it is possible and likely that 
nonlinearity, lack of homoscedasticity, and nonnormality of arrays will 
tend to be associated with skewness in one or both the marginal distri¬ 
butions. 
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4 r or r 2 can be interpreted in terms of the proportion of elements 
common to the two variables provided we are willing to make rather 
hazardous and unrealistic assumptions as regards the nature of the 

variables. „ , . ~ . u 

5 r can be interpreted mathematically in terms of the equation tor the 

normal correlation surface. This assumes that both regressions are linear 
that homosccdasticity and normality hold for both the horizontal and 
vertical array distributions, and that both marginal distributions are 
normal in form. 

The nature of the investigation will usually dictate or suggest t 
appropriate interpretation. Ordinarily the fifth will not be used in connec¬ 
tion with the application of the correlational method, whereas the fourth 
rests on assumptions which can seldom be met. 




Chapter 10 

FACTORS WHICH AFFECT THE 
CORRELATION COEFFICIENT 


Before we interpret or draw conclusions from a particular correlation 
coefficient, it is necessary that we ask ourselves what factors might have 
affected its magnitude? The size of an obtained r depends upon several 
specific conditions, and, even though it is not always essential that correc¬ 
tions be applied, the investigator must forever be on the lookout for 
correlations which deviate from their “true” value because of the operation 
o disturbers. This chapter is devoted to a discussion of the more common 
factors which influence r. 

It is assumed that errors in computation have not been permitted—that 
all arithmetical work has been checked. It is also assumed that sufficient 
intervals have been used so as to make unnecessary the application of 
Sheppard’s correction for grouping; if more than twelve intervals have 
been used, the slight increase in r which results from correcting the stand¬ 
ard deviations will be negligible. Certain textbooks have advocated a 
correction to r for smallness of the sample, which correction reduces r by 
a negligible amount. In view of the magnitude of the effects of other 
factors on r, these two possible corrections seem trifling. 


SELECTION 

One of the first questions which must be faced is: Do the cases on 
which r ts based represent a random sampling of some defined population 
or have selective factors so operated as to increase or decrease r? The 
literature of psychology is not free from correlation coefficients which are 
decidedly different from values that would have been obtained had the 
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sampling been random. This is not to say that any investigator has willfully 
selected^ cases so as to produce correlation but rather to ‘ 
unwitting errors are frequently present m spite of an effort to avoid 
spleotive factors. 


SAMPLING ERRORS 

Even though we may feel reasonably sure of the randomness of the 
sample on which an r is based, it is still necessary to consider the obtained r 
in terms of variable errors due to sampling. Any r based on N pair of 
observations will differ more or less from the universe or populatio 
value which is here conceived of as the value of the correlation«^efficient 
which we would obtain if we had an infinitely large sample. Many of the 
older texts gave (1 - r 2 )/VlV as the standard error of r, but failed to point 
out I serious limitation as regards interpretation: that this is an approx, 
mation and that rs for successive samples are not distributed normally 
unless N is large and/or the universe value is near zero. 

Before further discussion it should be said that some measure of ** 
sampling fluctuation of the correlation coefficient is highly desirable for 
any of three reasons. (1) We may wish to say whether an obtained r can 
be taken as representing a real, nonchance, correlation, i.e., w e en 
deviates sufficiently far from zero so that we cannot regard * as a " ha ” ce 
fluctuation from no relationship; (2) we may w ° nder whethe 3 ^ e “ " 
deviates significantly from some a priori or expected value, or (3) we may 
raise the question of whether two obtained rs are significantly differentTrom 
each other. The answers to these questions must be m terms of probability, 
and the probability figure which we accept as indicating significance 
determines the confidence with which we regard any such conclusions a 

W6 If' VS heater than 50, and if we are interested in saying whether or not 
an r (of 8 50 or less, usually) is significantly different from zero, we can 
determine its standard error by 

«, =4= (lO-D 

r vw 

and then divide the obtained r by this standard error in order to secure a z 
value with which to enter the normal probability table. If rj<r r is grea er 
than 2 58, we can conclude with a fairly high degree of sureness that the 
true or universe value of r is likely to be greater than zero. 

For N less than 50, it is necessary to follow a different procedure. It can 
be shown that, if the correlation coefficient is computed for successive 
Lpta dr,«n from . population for whioh ,1m co.r.l.fon . wo. the 
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successive values of 


t = 


'N - 2 


Vl-r 2 V( 1 - r 2 )/(iV - 2) (1 °' 2) 

ofleveloT the /, distribution with #= A - 2. If a sample t reaches the 

from zero o^'thaTT’ ^ W0Ul f COndude that “ is not a chance Nation 
involved. ’ * 6 correIatlon exists between the two variables 

From the foregoing expression, it would appear that the / for testing 
. u ^ 1 2) as an estimate of the sampling error of r However 

5SST* “• **“ ™T«..io«” I« 

testhXX may T der Why the dfk taken as A - 2. Actually, when we 

Hs zero the “ " r ’ W ® “* t6Sting the si g nifican ce of regression. If 

slope ofVhe ZT 81011 r Zer ° the 86,186 that the re S ression client or 
f ^ ^ ession * me ls zero - Now a linear regression line involves 

£taSinXh?! 8101 ’ 6 ^ 4 kS interCept; h6nCe 2 d 4 rees of f«edom are 
ost in fitting the line. Suppose TV = 2, and that the two X scores differ- 

ikewise the two Y scores. Imagine these pairs of scores plotted ifr a 
comp e uted ag The and a re S r . ession line fitted or a correlation coefficient 

3orefoXes a gre , SS1 °f n t 116 ^ g ° thr ° Ugh b ° th P lotted points; 

ereiore for the sample of two cases the prediction would be perfect and r 

.igebS; h, 

u +1 on V; T“ in b0,h *he 

ust be +1 or -1. In other words, with N = 2 there is no freedom for 

sampling variation in the numerical value of r. 

VM X/ t6St 0f r . assames normality and homoscedasticity either for the 
vertical array or for the horizontal array distributions. Nothing is assumed 
about the total r and 7 distributions. There is evidence, as wlh the “Test 
means, that sizable violations of the assumptions are tolerable but 
nere is always comfort in knowing that the assumptions are fairly’well 

hXX 1 ” k 6 reraarked at this Piace that if the sum of squares of the 
eviations about a regression line were divided by TV - 2, theV we would 

have an unblased estimate of the erfor of vaf .^ ^ 

frenre s" T™ Unnecessar y in the Poetical situation where prediction 
(regression) equations are usually based on sizable TVs P 

becaXXhtVJV^^X 6 T r ° f r ’ Wh6n r »» is lar S e > ^ misleading, 
i m ' P ° rh , 8h ! U6S ° f the distribution of successive sample values 
s markedly skewed. This skewness becomes noticeable when r reaches 
•40 or .50 and increases rapidly as nears unity. The skewness is at 
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a function of N. Because of this skewness the standard error of r loses its 
meaning; it cannot be expected to yield a trustworthy answer as to whether 
an obtained r deviates significantly from some a prion value, nor can he 
significance of the difference between two rs be determined by substituting 
in the ordinary formula for the standard error of a difference 

The r to /transformation. Professor R. A. Fisher has developed a 
very useful and accurate technique for handling sampling errors for hig 
values of r. This procedure is also applicable for low rs and can be used 
when N is large or small. He employs a transformation 

(10.3) 


z = i log, (1 + r) - i log, (1 - r) 


or 


* = 1.1513 log 


1 + r 


10 


(10.4) 


which has two distinct advantages: (1) the distribution of . for successive 
samples is independent of the universe value, i.e„ for a given Athe sampling 
distribution will have the same dispersion for all values of r ; (2) the 
distribution of 2 for successive samples is so nearly normal that it can e 
treated as such with very little loss of accuracy. The standard error of 2 is 

(10.5) 


CL = 


JN- 3 


Note on notation: Since the standard error of 2 is a theoretical value, 
dependent solely on N, and hence does not involve estimation from the 
sample, it is symbolized as a z rather than as S t or s z . 

If we wish to state the .99 confidence limits for r v „ we transform the 
obtained r to * by formula (10.4) or by Table B of the Appendix, determine 
u find . + 2.58*. and * - 2.58<r„ and then transform these two 2 values 
back to rs by using Table C. As an example and m contrasts the less exact 
procedure of taking r ± 2.58S',, where S r = (1 - r^VN let us suppose 
an r of .90 based on an N of 50. The standard error of r by the usual 
formula is .027; whence .90 ± (2.58)(.027) yields the values .830 and ^970 
as confidence limits for the universe value. Now if we utilize the 2 
transformation, we find 2 = 1.47, and <r, = .146, whence 1.47 ± (2.58) 

(146) gives 1.093 and 1.847. These two values are then transformed back 
to the two r values, .798 and .951, which it will be noted differ from the 
confidence limits for as determined by using the classical S 

Note: Since the foregoing 2 is not a relative deviate, it should not be 

referred to as a standard score. . .. ■ ~ 

Difference between rs. If we wish to determine the significance of 
the difference between two rs, both are transformed into 2 s, and the 
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standard error of the difference between the two zs is obtained by 
<T * 1 ”* 2 VjVj - 3 + JV 2 - 3 

and then the ratio of the difference to its standard error is treated in the 
usual manner. If the zs are significantly different, we conclude that the 
two rs are significantly different. 

Suppose we have the correlation between X x and W 2 and also between 
and X 3 , with both rs based on the same sample of N cases, and we wish 
to decide whether there is a significant difference between r 12 and r 13 . The 
foregoing method is not applicable because we need to allow for the fact 
that, for successive samplings, r 12 and r 13 are not independently distributed, 
but correlated. The standard error of the difference must include a 
subtractive r term involving the correlation between the correlation co¬ 
efficients. The methods for estimating this needed correlation are none too 
satisfactory, but there is a test which is interpretable by way of the t table 

for N small and by way of the normal table for N large. It has been shown 
.that 

j — - ~~ r i 3 )\/(N — 3)(1 + r 23 ) 

V2(i - r 2 12 - - r 2 23 + 2r 12 r 13 r 23 ) 

follows the t distribution with N — 3 degrees of freedom when the null 
hypothesis of no difference is true. If t is significant, we conclude that one 
variable correlates higher than the other with X v 
Averaging correlations. When we have two (or more) sample values 
for the correlation between two variables we may wish to average the rs 
(1) in case it is known that the samples have been drawn from the same 
population or (2) in case it can be assumed (because the rs are not signifi¬ 
cantly different from each other) that the samples have been drawn from 
equally correlated populations. An appropriate procedure is to convert 
each r to z, then take a weighted (each z by the inverse of its sampling 
variance) average of the zs. Thus, for three sample values this weighted 
average is given by 

z av = ~ 3 ) g i + (N 2 — 3)z 2 + (N 3 — 3)z 3 

( N i ~ 3 ) + (N 2 - 3) + (iV 3 - 3) 

This z av can be transformed back to an r, and any significance test concerning 
such an average r would be made on z^ which has a standard error of 

l/VlAq - 3) + (N 2 - 3) + (N 3 - 3) 

Sampling errors of regression coefficients. Those who are ascertaining 
relationships involving two variables, one of which can definitely be 
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characterized as independent (X) and the other as dependent ( Y\ or one 
as an antecedent and the other as a consequent variable, may prefer to 
specify the relationship in terms of the regression constants, B and A. 
Both of these are, of course, estimates of unknown population values and 
are therefore subject to sampling errors. Ordinarily A is of little interest 
whereas B specifies the rate of change and may be used for testing hypo¬ 
theses. Does the sample slope vary significantly from some hypothetical 
value, B h l Or does the slope differ significantly from zero? Occasionally, 
we may have two regression coefficients (same variables) and wish to test 
the significance of the difference between the two slopes, B 1 and B 2 . 

To test whether or not a single slope, B, differs significantly from zero or 
some other a priori value, we need an estimate of its standard error. The 
classical standard error of B yx was taken as 

S = Sy ' x = S v^ 1 ~ r2 

Byx *W3v S^N 

Note that the formula involves a ratio which is a function of the Y measure¬ 
ment units relative to the X measurement units. This is reasonable since 
the slope is also a ratio of Y to X units. (The standard error of any 
statistic must be in the same units as the statistic.) The upstairs part 
of the formula is the familiar standard error of estimate. It is reasonable 
that the sampling stability of B yx should depend on the variability within 
the arrays because, in effect, we are fitting the regression line to the 
array means (weighted by their Ns), and the stability of these means is a 
function of the array variances. 

Greater precision in testing hypotheses regarding B yx will be achieved by 
having an unbiased estimate of its sampling error. It will facilitate 
exposition to note that for any variable, V, the variance is given, by 
S\ =-- I>v 2 /N. If we have S 2 V and wish to recover the sum of the squares of 
the deviations, Si; 2 , we can use Si; 2 = NS 2 V . To secure an unbiased 
estimate we would have s 2 v = S v 2 l(N - 1) = NS 2 J(N - 1). Paren¬ 
thetically, it might be noted that since SJS X = s y /s x , the slope would be 
unchanged by introducing unbiased estimates of the two standard 
deviations. 

For the sampling variance of B yx we need 

r.2 

ri2 V’X / j ^s 

Byx NS 2 (10.8) 

in which for reasons not yet specified we have allowed a mixture of biased 
and unbiased estimates. To get s 2 y . x we would need to divide S( Y — Y') 2 
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4 


by its df, N - 2. But 5V* = S(7 - r) 2 /N, hence S(F - 7') 2 = NS\. X 
= jV5 2 j,(l — r 2 ). Therefore 


2 = S(y - r ) 2 ^ jvsyi - r 2 ) 

1 N - 2 iV - 2 


which leads to 


s 


2 

Byx 


NS\(\ - r 2 ) S a „(l - r 2 ) 
s 3 ,,., JV-2 N — 2 

NS\ 1VS 2 „ S\ 


the square root of which gives 


$Byx 


s„Vi - 

S^N - 


(10.9) 


To test the null hypothesis that the slope for the population is zero, we 
have 


B„ _ r(SJS x ) 

s B „ Sj 1 - r*!SjN - 2 


with N — 2 degrees of freedom. Note that since SJS X cancel, it follows 
that if we used unbiased estimates, s y and s xf they would also cancel. With 
the 5s cancelled we have the ratio 


_r__ 

V(1 - r 2 )/(JV - 2) 


which is the t test for the significance of r. When testing the null hypothesis 
of zero slope we are doing nothing more than testing the null hypothesis of 
zero correlation, a fact that certainly could have been anticipated since 
the only way that B yx — r(S y lS x ) can be zero is for r to be zero. The 
mathematical purist can say that B could be zero when S u is zero, but when 
S y is zero the slope and r become indeterminates: you cannot have a 
relationship or a slope in the absence of variation for either variable. 

For those who have an aversion to r and standard deviations and who 
persist in the erroneous belief that a test of B yx differs from a test of r, it 
should be noted that r and the 5s can be avoided by expressing B yx as in 
equation (9.7) and taking the following as an expression for the unbiased 
estimate of the sampling error of B yx : 


S By 


S(F - Y'y 
N - 2 


I nZX 2 - (SX) i 


N 
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in which 

Y') 2 = Sr-4S7-4SI7 (10.10) 

with A vx and B yx as in equation (9.7). 

Difference between regression coefficients. Let B x and B 2 be the values 
for B vx for two independent samples of N x and N 2 persons. The t test for 
the difference between the two Bs is analogous to that for testing the 
difference between independent means. We need s D which, it might be 
guessed, will follow the usual pattern: 

' > = ^x + A, 

for which we need the best unbiased estimates of the two variances under 
the radical. Instead of using the two residual variances separately, a 
combined estimate is obtained by combining the sums of squares of 
deviations about the respective regression lines and then dividing by the 
combined dj\ or N x + N 2 — 4. The best estimate of the (assumed to be) 
common residual variance may be expressed in symbols as 


,2 _S(y- Y') 2 , +2(7- Y')\ 

y, x — ' -- ' ' ' 

N + N 2 - 4 

with each numerator term calculable by equation (10.10) written separately 
for the two groups. If the rs and ,Ss have been computed, we may use the 
exact equivalent 


lyyi - f») + n 2 s\,( 1 - r \) 

N, + N. - 4 


Then by utilizing (10.8), we have 



and t = (B 1 - B 2 )ls Db as a ratio that follows the t distribution with 
-^l + N 2 — 4 degrees of freedom. 

It is of interest to note that whereas the t tests for the two null hypotheses, 
r pop zero and B pov zero, are identical, the test for the difference between £s 
is not the same as that for the difference between the two respective rs. 
That this should be so may be understood by considering two rs and two 
Bs based on two samples one of which has a larger range of As (larger 
than the other. The two slopes could very well be nearly identical, yet one 
r be considerably higher than the other. This constitutes one argument for 
‘‘regression” instead of “correlational” analysis: for the X and Y as 
independent-dependent variable situation, the slope is not a function of 
curtailment on X, or the range of X scores. 
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RANGE OR SPREAD OF TALENT 

The magnitude of the correlation coefficient varies with the degree of 
heterogeneity (with respect to the traits being correlated) of the sample. 
If we are drawing a sample from a group which is restricted in range with 
regard to either or both variables, the correlation will be relatively low. 
Thus the restricted range of intelligence is one factor which leads to lower 
correlation between intelligence and grades for college students than that 

Table 10.1. Values for r xy for R xy s of .30, .40, • • • .80 with sdJSD x 
values of .90, .80, • • • .50 


sdjSD x 

.30 

.40 

.90 

.272 

.366 

.80 

.244 

.330 

.70 

.215 

.292 

.60 

.185 

.253 

.50 

.155 

.213 


.50 

.60 

.70 

.80 

.461 

.559 

.662 

.768 

.419 

.514 

.617 

.730 

.375 

.465 

.566 

.682 

.327 

.410 

.507 

.625 

.277 

.351 

.440 

.555 


usually found for high school groups. If the range with respect to one 
variable has been curtailed, and we know the standard deviation for an 
uncurtailed distribution, it is possible to adjust the correlation for the 
difference in range, provided we can be sure of the tenability of two 
assumptions: that the regressions are linear and that the arrays are 
homoscedastic for the scatter based on the uncurtailed distribution. If the 
curtailment is in variable X , and we let 


sd x = S for curtailed distribution 
SD X = S for uncurtailed distribution 
r xy = correlation for curtailed range 
R , = correlation for uncurtailed range 


the relationship by 
given by 


which we would predict R xv from sd X9 SD X , and r xy is 

r xy(SDd S doc) _ ( 10 . 11 ) 


Rxv Vi 


r + r‘ 

' xy 1 


^SDjsd^y 


Obviously, if we have R instead of r, the value of r for a restricted range can 
be estimated by formula (10.11). All we need to do is interchange SD X 
and sd x , R and r, and then substitute to find r. The estimation of r need not 
be made in ignorance of whether the assumptions of linearity and homo- 
scedasticity can be met; an examination of the accessible scatter for the 
uncurtailed range will reveal the facts. 
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Formula (10.11) indicates definitely that the magnitude of the co^on 
coefficient is a function of the degree of heterogeneity with respect to one 
of the traits being correlated. A better appreciation of the extent of this 
»«*'«.» taW b, examining Table .0.1 which g™, « 
values of R along the top and different sdJSD. ratios along the left, the 
corresponding values of r„. It can be shown that double selection, i.e 
curtailment on both variables, tends to depress the correlation coefficient 
Since the formulas for “correcting” for double curtailment are not too 

^OnTimponant rule emerges from the foregoing: standard dev ^°“* 
should always be reported along with correlation coeffici^ 
indication should be given as to variation typically found for the variables. 

EFFECTS OF UNRELIABILITY 

Before considering the effect of unreliability, or errors of i~errlent 
on the correlation between two variables, it is necessary that we dig 
to explain briefly what is meant by reliability. If we were assigned the 
task of determining the height of an individual by the use of a tape 
measure we might be satisfied with one measurement, but unfortunately a 
single determination might not be entirely free from error. To ° verco " 1 ® 
thif two or more measures are averaged on the assumption that the chance 
ZmZle errors will more or less cancel out. If we compute the standard 
deviation of the distribution of several measurements (of the same thing), a 
summary figure indicating the possible magnitude of the variable emm 
will be obtained. This S neither pertains to nor measures the magnitude^of 
a possible constant error, i.e., an error which affects all the measurements 
inthe same direction. We are here concerned only with the magnitude o 
variable errors, or inaccuracies in measurement which are of a chance 

^Reliability. If we had the problem of determining the error m the 
measurement of height, we could make several measurements on one 
person and compute a measure of accuracy, or we might make just two 
measures on each of several persons and take some function of the differ¬ 
ence Lween the two measurements for all N individuals as our gauge of 
accuracy Either scheme leads to an estimate of the size of the variable 

it is not always feasible or possible to 
obtain more than two measures on an individual for a given trait; hence 
h is necessary to use the second-mentioned scheme for determining the 
accuracy of measurement. The mean or median absolute error may suffice 
but as in physical measurement, we sometimes need to know the extent of 
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«;“ 5 :rr od ^ 

=is=s.!-r - — 

5 = “ obtained score or measure for an individual 
A t = nis true score 

e = a variable error, positive or negative 

Then we can consider that 

, X ~ X\ + € 

or in deviation units 

X = X£ -f- 6 

The variance of the obtained scores will be 

S i x = S i , + S\ ( 10 .12) 

provided we can assume x, and e uncorrelateH tl,*o 

~ M : »»>■"> 1Z7SXZZ 

x l = x t + % 

The^ehabihtj SC °. res ' 
comparable measures of the same thing, it 

have the reliability coefficient/* ^ ^ individuai '> Thus ™ 

= ^ x i x 2 _ £(#< + e-i)(x, + <? 2 ) 


y — p 

®35 'tC^a 








2 102 
j_+ Sar f g 2 + S,^e-| -f- 


W5i5 s 
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Dividing by N gives 


= 


+ r Un S t S^ + r u ,S t S $1 + V-.S-.S, 




SiS* 


( 10 . 13 ) 


If we assume all three rs in the numerator equal to zero, we have 

5 2 

£C0& 

0l0 2 

It is assumed that we are correlating comparable measures of the same 
thing or trait—comparable in the sense that S ei = S e ^ and S 1 = S 2 . 
(The same trait is implied in that and are measures of x t .) Whence 

we have 

5 2 


( 10 . 14 ) 


where S x = S 1 = S 2 . The reliability coefficient can be interpreted as a 
proportion, since from formula (10.12) we have 


s 2 ^ s 2 n 


= 1 


i.e., the reliability coefficient represents the proportion of the variance 
of the obtained scores which is due to the variance of the true scores. It 
follows that 1 — r xx gives the proportion of the variance which is due to 

errors of measurement. .. .. . 

Obviously, the reliability coefficient can, by substitution from formula 

(10.14) into the foregoing expression, also be written as 

Si 


y — 1 — 
' xx x 


sK 


(10.15) 


which indicates clearly that the reliability coefficient is a function of the 
magnitude of the variable error relative to the variability of the trait in 
question. It also follows from formula (10.15) that the error of measure¬ 
ment can be stated in terms of the reliability coefficient and S x , thus, 

= s.vT=7~ (io.i6) 

That S, is to be interpreted as the standard error of measurement may 
be clarified if we note that, when * (= x x or x 2 ) is taken as evidence of the 
true score, x — x t becomes the error, and the standard deviation of such 
errors will be S„ as can be shown by easy algebra. If it were possible to 
secure a large number of measures on an individual, we would expect these 
measures to distribute themselves normally about the true score with a 
standard deviation corresponding to S t . Thus, if the result of one testing 
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fhafthJ ■ h °?V S * = 3 ’ We Can conclude with high confidence 

that the individual s true position, on the scale of measured (obtained) 

Qs is somewhere between 71 and 89 (80 ± 3 S e ), and with fair confidence 
that it is somewhere between 74 and 86. 

Determination of reliability. The foregoing argument regarding the 
interpretation of the reliability coefficient either as an indicated of rektive 
ter ~ S ° f S ‘ rests 011 the opposition that we have obtained 

of the la * 1 th C ° e T! aS the reSUlt ° f correlatin g comparable measures 
same mg and that the variable errors are uncorrelated with them- 

“ Ve , S ., a t nd W £ h the true scores ' The Practical determination of the 
reliability coefficient involves more, therefore, than the mere correlating of 
two sets of measurements. The conditions under which the two sets of 
scores are obtained must be scrutinized for possible violation of the 
requisite assumptions. Some of the difficulties involved in ascertaining 

*" “a*- 

First let us note that the chance variable error, e, can be broken up into 
many smaller components, at least logically, although not necessarily 
experimentally. Thus we might set ^ 


in which 


e “ + e » + e c + e d + e f + • • • 

e a = error in the instrument or test 

= error due to extraneous physical disturbance 
= error due to physiological condition of individual 
e a = error in scoring or in reading instrument 
e f = error due to day-to-day fluctuations 

Other sources of variable error might be added, or some of those listed 
might be broken up into more minute parts. It is not assumed that these 
several sources contribute an equal amount to the variance of e, nor is it 
assumed that these several components are entirely independent of each 

condition ’ f mStanCe ’ dally fluctuations might be influenced by physiological 

The assumption of uncorrelated errors implies that e t is not correlated 
with e 2 Of course the two scores for an individual might by chance contain 
a variable error of the same magnitude and sign; we are here interested 
however, in whether an error which is chance for one score might tend in 
general to affect the second score in the same manner. For example an 
upset stomach might lead to a reduced performance score, and if’the 
S m°?l teSt was administered the same day, this same chance factor would 
affect the second performance score in the same direction. Thus in examin¬ 
ing any proposed scheme for determining the reliability of a test, we must 
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inquire as to whether any of the sources of error can affect the two measure¬ 
ments on an individual in the same direction. If it seems reasonable to 
suspect that errors are correlated, it follows that the obtained reliability 
coefficient will be spuriously high since the presence of correlated errors 
will not allow formula (10.13) to be reduced to (10.14). 

The presence of e a as a source of error may be appreciated if we regard 
the items as providing a sampling of the performance (or responses) of 
the individual. Consider the simple problem of measuring vocabulary 
level. We can easily conceive of the individual’s true score as the pro¬ 
portion, p t , of words in Webster’s Unabridged Dictionary that he can 
satisfactorily define. Instead of the time-consuming tedium of asking him 
to define each word, we might resort to sampling. We could get a fairly 
good sample of, say, n = 30 words by taking the fourth word from the top 
of the right-hand column of every page ending in the numeral 55, The 
standard error of his score would be given by V p t q t i 30, or more gener¬ 
ally by V p t q t l n > Even though p t is unknown (estimable as p ob ), it is readily 
seen that the larger the number of items the smaller the error, and vice 
versa. Thus to reduce e a , as instrumental error, the length of the test is 
increased. This general principle holds even though our vocabulary 
illustration is not quite analogous to what goes on in test construction 
because a “universe” of items is rarely, if ever, available for sampling and 
because test constructors tend to improve on randomness by selecting 
items that possess certain desirable characteristics. 

Let us consider a few of the “accepted” schemes for ascertaining 
reliability in order to see whether they are “acceptable” in light of the 
assumptions requisite to a sound reliability coefficient. These assumptions 
may be recapitulated in the form of three questions. Do the two tests or 
determinations represent measures of the same thing? Are the two series 
of measures comparable (comparable tests or instruments) ? Is it possible 
or likely that the errors of measurement are correlated; i.e., can the error 
on the first test be correlated with the error on the second, or can the error 
on either be correlated with the true measure ? 

For the ordinary mental, personality, or achievement test, reliability is 
usually ascertained by correlating supposedly equivalent (comparable?) 
forms, by correlating split halves (odd vs. even items or first half vs. 
second half of test), or by correlating test-retest scores. The test-retest 
method is of limited value in that there may be a memory carry-over from 
test to retest, in which case the retest will measure the same trait as the 
original test plus memory effects. In order to overcome this memory 
transfer, the retest may be administered some months after the first test, 
but this permits of a possible change in the trait or ability as a result of 
maturation or experience. 
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Split-half reliability involves the correlating of two halves and applying 
the Brown-Spearman formula to determine the reliability of scores based 
on the whole test. This formula is easily derived. Let X x and X 2 stand for 
the respective halves. Now r 12 would be the reliability for scores based on 
either half, but in practice we always use total scores, defined as X a 
= X 1 J r X 2 . The reliability of X a can be thought of in terms of the cor¬ 
relation between X a and an imaginary set of comparable scores, X b 
= X 3 + X 4 , where X 3 and JT 4 are scores on the two respective halves of a 
nonexistent form of the test. Given information about X x and X 2 , we seek 
an expression for r ab . In deviation units, x g — x ± + x 2 and x h — x 3 + x 4 ; 
hence we may write 

r __ + x 2 )(x 3 + x A ) 

ab NS a S b NS a S b 

_ Sar-jffg + Saqaq + Hx 2 'x 3 + 'Lx 2 x i 

NsJb 

Dividing through by Nand utilizing formula (8.1), and with formula (9.9) 
as a basis for specifying S a and S b , we have 


™ _ r i3^1^3 + r l'4^1^4 + r 23^2^3 + r 2i^2^A 

1 ab — /— — - , ... 

V S i + S“ 2 + 2r 12 S 1 S' 2 \ S 2 3 + S 2 4 + 2r M S 3 S 4 

Now it is assumed that the X x and X 2 scores are comparable (equivalent 
sets, with S ± — S 2 ), and we simply say that our imaginary scores, X 3 and 
X 4 , are comparable with each other and also with X ± and X 2 ; hence all 
four S s have the same value, and therefore cancel out, leaving 


r = r i 3 + r u + r 23 + r 24 
06 V 2 + 2 W 2 + 2 '*34 

Comparable or equivalent sets of scores will correlate equally with each 
other, that is, the five unknown rs in this expression will all equal r 12 , our 
known value. Therefore we have 


r 


XX 


_Ifi2_ 

x /2 + 2 W 2 + 2 ^*12 


2y *12 
1 + r l2 


(10.17) 


as the reliability of scores based on the whole test. 

The only assumption underlying formula (10.17) is that the two halves 
being correlated are comparable (equivalent or parallel). If the test items 
have been arranged according to difficulty, a first-half vs. second-half relia¬ 
bility will not satisfy the notion of comparable measures. Ordinarily the 
odd-even item technique will satisfy the criteria of comparability and 
sameness of trait. Neither of the split-half methods will satisfy the assump¬ 
tion of uncorrelated errors. Since both measures are determined at the 
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same sitting, any chance fluctuations due to physiological conditions or to 
chance factors in the test situation will influence the two scores of an 
individual in the same direction. It is to be expected, therefore that the 
correlation of halves will in general lead to a reliability coefficient which is 
too high, giving us an exaggerated notion of the accuracy with which we can 

place art individual on the trait continuum. . > 

By far the best method for determining the reliability of a test is to have 
two forms which have been made equivalent and comparable by careful 
selection and balancing of items. No item in one form should be so nearly 
identical with an item in the other form as to permit a direct memory 
transfer. Two forms, equivalent yet not identical, can be administered 
within, say, 2 weeks’ time-a procedure which properly includes m the 
estimate of variable error the daily fluctuations due to either physiological 
or psychological conditions and variations due to chance factors in the 
physical situation in which the tests are given. With so short an interval 
between testings, the trait being measured will have changed only a negli¬ 
gible amount as a result of maturation or ordinary environmental influences. 

The form versus form method for calculating reliability may reflect two 
major sources of unreliability: instrumental error and trait instability. 

If it is claimed that X indicates a person’s position on a scale and jf we 
wish to know something of the precision of the score, we should so deter¬ 
mine S e as to include both major sources of error. High precision for a 
score earned today is a necessary but not sufficient condition for score 
stability; if the day-to-day variation happens to be large we can not have a 
very dependable score-its lack of dependability associated with the 
accident of measurement on a particular day should be incorporated in S'*. 

When we attempt to obtain the reliability of a learning score or of any 
performance which is influenced by practice, we encounter difficulties 
which are baffling to the researcher who rigorously adheres to the fun m- 
mental requisites of the reliability coefficient. The chief difficulty is the 
obvious fact that the “thing” being measured changes as > a result of each 
measurement or trial. Test-retest, or first half vs. second half (of trials), 
or today’s trials vs. tomorrow’s will not represent measures of the same 
function, nor will any scheme analogous to equivalent forms avoid t us 
difficulty, since “forms” which are comparable will permit transfer, lhe 
use of scores on odd vs. even trials will have the advantage of balancing 
somewhat the influence of practice, especially if several trials are given 
but the possibility that a chance error affects odds and evens alike is present 
in that a slip in the experimental procedure or a temporary discouragement 
on the part of the testee or the adoption by the subject of a poor approach 
to the problem will have a similar effect on both scores. If trials were 
spaced, say, a day apart, the factors just mentioned might not greatly 
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disturb the reliability determination. In general, it can be said that the 

thJv h l? me Sl ;° rt T min f are present in the aforementioned methods when 
they are employed m determining the reliability of animal (or human) 
maze-learning scores. Other techniques, peculiar to the maze situation 

shffila/to P dT 0Se H' Perf0rmances on the odd and ^en blinds, somewhat 
similar to odd and even items, have been correlated for the purpose of 

re la 11 y, but since blinds differ considerably as regards difficulty we 

cannot be sure that the two halves are comparable. We can also question 

the comparability of the first half and second half of the maze since in 

general the last part tends to be learned more quickly than ’the first 

“® mP t a * ce « aln the reliability of one maze by correlating perform- 

fi ^ K r h that °“ another maze involve several difficulties. In the 
rst place, there seems to be a general positive transfer (perhaps a general 

the second maze must be similar to the first in order to satisfy the requisite 
o comparable measures of the same ability, but if this similarity ap¬ 
proaches identity the second maze becomes a retest; and thirdly a dose 

acfdiffer H T y Wi ' Head t0 P0SSiWe imerferenCe which may 

act differentially from animal to animal. y 

The foregoing brief discussion of the requisites for and difficulties in 
arriving at a meaningful reliability coefficient should make obvious the 

rehabiS F ° r examwi " g crltlca11 'T an y proposed method of determining the 
rehab ty of a psychological measurement. The interpretation of the 
reliability coefficient in terms of the standard error of measurement 
defim ely assumes homoscedasticity, which is another way of saying that 
the reliability coefficient is valid only when the error of measurement is of 
the same order of magnitude for the entire range of scores. That this mav 

ReviL7ofth^in r e U t e Test eVident ^ ^ ^ 1937 Stanf ° rd 

tefl 1 shoaId be noted that th e magnitude of the reliability coefficient is 
nfluenced by the trait homogeneity of the sample on which it is based 

if / ^ e P resent the standard deviation for the restricted range, SD the 

aa tst sr * ” ric,ed «• •» 2 

tS S forS *"i 6 rellablllt y for the unrestricted. If we may assume 
that for the smaller range equals S. for the larger range, we may write 


W - o = (SD) 2 (l - R xx ) 


( 10 . 18 ) 


as a formula from which we can infer r xx from R xx , and vice versa 
more homogeneous the group, the lower the reliability coefficient. 


The 
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Attenuation Now we return to the question which led to this lengthy 
detour: How does unreliability affect the correlation between variables. 

Let 

x = x t + e 

y — Vt + d 

where e and d represent the variable errors in the two scores, - and y. Then 

Six, + e)(y t + d) 


r„„ = 


NSA 


4- Sx,d + £ y t e + 2 ed 
NS X S V 


If we assume that d is uncorrelated with x ( , that e is uncorrelated with y t , 
and that e and d are uncorrelated, we have 


S x,y t 


NS„S, 


SA 


Since, in general, S t = S„Vr„ by formula (10.14), we have 

__ r tt S x Jr xx Sy \! r w (j u — r between true scores) 


SA 


r £cy — r xx\j r w 


( 10 . 19 ) 


which since the reliability coefficients are less than unity, shows clearly 
that the correlation between obtained scores will be less than the ccrrclatio 
between true scores; i.e., errors of measurement tend to reduce or 
ate the correlation between traits. 

We can rearrange formula (10.19) as 

1*3 iy (10.20) 

r « = 7 TIT 

bv which we can estimate what the correlation would be if perfect 
errorief measures were available. This is known as correct,on M 

SSSSaSr 

ofhtt e omct c Aue since they cannot be used in prediction equations. 

Th“ EOf o« varinMt gJS™ 

of estimate must necessarily be based on obtained, or fallible, rather 

true scores. 
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of tbe reliability 

of fallible scores. By reference to formula (oWw^nb °" ' f * reSUlt 
correlation between ^ we observe that, if the 

to errors of measurement the ^ ^ ’ nd lfeacb variable is subject 

product of ,h ™ «««l to 

if the rcliubili the °»™sly, 

than the reliability coefficient ’ correlation cannot be greater 

...‘rsr »“,r sp “ ifa,iy »“'«>* 

obUtadrtiSnjS™S ? T" “ 

is of comparable h«=S, c™"f pk! " ' ” ” « “"P* »>*h 

have been rennrtpd u l' Corrected rs greatly in excess of unity 

s, " sui , ts ,e t d us - <£ 

any corrected r even though be raised conc eming 

tions are difficult to meet B has be T? ^ SinCe the assum P 
mately exceed unity by as much as?? * ““ ' Can le ^' 
Formulas for the standard error of a , tmeS ltS sam P lln g err °r. 

is known concerning the^ nature'Y ffiYd Y'hY aVadab * e ’ but nothing 
successive samples Presumablv Y c [ Utl ° n of correcte d « for 
skewed for high values hYce thY ? Uti ° U would be markedly 
technique to determine whether a Y n ordlnar y standard error 
magnitude) by “ hi an ^ ( ° r 0ther 

^ “ to basis 

"™,rirsrs or r m - r ° r 

tests for the. difference between means ^Tt^’ *’ ^ ^ Sample ’ 
that D m — M, — M would nnt h * SeCmS reasonabIe t0 presume 

?*&&*&££*=i 

ment because ^ he ™ reased b X e ™s of measure- 

(10.12)]. For correlated means '" Creased h y. sucb err °rs [see formula 
ment errors blau^^tllarian^Y^the^dlffe'' 1 * ^ ncreased b ^ meas ure- 

.neb,*, T b.sfo,.i,b., 1 ,dcp.„s,rrrcd°r,i:rri b ) 




[10] FACTORS WHICH AFFECT THE CORRELATION COEFFICIENT 155 

are systematically reduced by errors of measurement. Moral: the use of 
unreliable measures is not conducive to the finding of significant differ¬ 
ences. 

Slope and measurement errors. The slope for the regression of 7 on 
X, B yx — r(S y IS x ) can be written in terms of the correlation between true 
scores and the Ss for true scores by utilizing (10.20) and (10.14). Thus 


R „ = . ■— • = 


I'm _ 


\] ^xx\j ^ 


S I r 

x\ ’ xx 


r <s 

' XX X 


from which we see that the slope in terms of true scores is larger than 
the slope based on obtained scores; that is, the slope is reduced by errors 
in X. Note that the errors in 7do not affect the slope; this is reasonable 
because for a fixed value of X (or for an interval on X\ the average value 
of 7 will involve a balancing of the chance errors in the 7s, that is, the 
means of the vertical arrays will not be systematically affected by the 
errors in the 7s. Therefore the slope of the line “fitting” the array means 
will not be systematically affected by the 7 errors. For those primarily 
interested in slopes, it would seem unimportant to have high reliability for 
7, but it must be noted that the sampling variance of B vx will be increased 
by errors in 7 via an increase in the residual variance. 

Reliability of difference scores. There are three situations for which 
we may wish to say whether two scores differ more than expected on the 
basis of errors of measurement. First, two persons each with a score on a 
given test. Second, a change score based on an “initial” and a “final” 
score for a person. Third, the difference score for a person on two different 
tests. 

For the first situation, the standard error of the difference is given by 


Vs 2 e -\- S 2 e = S e V 2 

A difference would need to be approximately 2S e X^ 2 to be significant at 
the .05 level. 

For the second situation, let us be a bit unconventional in terminology 
by letting 7 stand for an initial score and X stand for a second score, on 
the same variable, taken after an experience or time interval that plight 
produce a change. (Ordinarily, we let X 1 and X 2 or X t and X f stand for 
such scores; the 7 and X notation will have certain conveniences in the 
sequel.) Thus the change score is given by D = X — 7 or, in deviation 
units, as d = x - y. Although 7 and X are based on the same instrument 
or test, the second score, or X, might be regarded as measuring the 
same thing as 7 plus the experimentally produced effect, variable over 
individuals. 
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Let the error in D be represented by 


or 


h = d- d t 
= 0 - y) - (x t - y t ) 
= x t ) - (y - y t ) 

d = — e nt 


Assuming that the errors in X and Y are uncorrelated, we have by the 
variance theorem, J 

S % = S % + = ^(1 ~ r xx ) + S\( 1 - r vy ) (10.21) 

the square root of which will yield the standard error (of measurement) for 
the change score. In order to determine the reliability coefficient for change 
(difference) scores, we may utilize (10.15) with a shift to subscripts appro¬ 
priate to the present problem: 

S 2 


= 1 




in which S' 2 * is the variance of difference (change) scores and is given (see 
p. 81) by & v 


Substituting, we get 


S 2 , = S 2 , + 


e c 




1 - 


S 2 *(l - V XX ) + syi - 

$ x + — 2r xy S x S y 


( 10 . 22 ) 


An estimate of r dd based solely on the N cases in an investigation 
involving change scores poses a problem: how ascertain the needed r 
and r yv values? Apparently, the most feasible procedure would be to use 
the odd-even, Spearman-Brown, method for calculating each value We 
might, however, secure a fair approximation of r dd when the scores are 
based on a standardized test or a scale of known reliability provided it 
seems safe to assume that the error of measurement variance is the same for 
the cases at hand as that for the group on which the known reliability 
coefficient was calculated and provided it can be assumed that the error of 
measurement variances for the first and second set of scores on our N 
cases are equal. The first assumption is tenable unless our N cases are 
highly atypical compared with those in the group yielding the known 
reliability coefficient. (Note that we need to assume the equivalence 
of two. S e values for this first assumption, not the equivalence of two 
reliability coefficients. Why ?) The second assumption would be question¬ 
able if the imposed experimental condition led to drastic changes (unlikely 
m most investigations). The approximation of r dd would be obtained by 
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replacing each of the numerator terms by the already available S 2 e of the 
test. Since the odd-even method is less fraught with assumptions, its use is 
preferable. 

It is of interest to simplify (10.22) to 

_ r xx^x ~h fyyS y ^ r a',yS x S y ("iO 23) 

t m- S > x + S\-2r x ,S x S y 

and then note what happens when S x — S y (i.e., when no change in varia¬ 
tion occurs). The Ss cancel, giving 

r xx + r w 2r xy (10.241 

dd - T^Yr ~ 

Under these conditions r yy could very well equal r xx , so that we would have 

r — ~ *V (10.25) 

1 ^ xy 

which makes it quite apparent that as r xy approaches r xx the value of r M 
approaches zero, a proposition that also tends to hold for r dd by way of 
(10.23) and its exact equivalent, (10.22). This means that for change 
scores to be reliable the experimentally produced effect must lead to a 
shift in the ordering of individuals (regardless of the presence or absence 
of an over-all mean change). With an r xx of, say, .90 and r xy of, say, .80, 
the value of r M by (10.25) is only .50. 

Although the foregoing was couched in terms of experimentally pro¬ 
duced changes, it is obvious that the same deductions hold for long-time 
changes in longitudinal studies. For either case the reliability of change 
scores may be surprisingly low despite high reliability for initial and final 
scores. The most serious consequence occurs when changes on one 
variable are being correlated either with changes on another variable or 
with scores on some other variable: the rs will be attenuated, sometimes 
so much as to make it difficult to obtain statistically significant rs. 

The (usual) unreliability of change scores poses a paradox when the 
question of the significance of the mean, or over-alf, change is considered. 
Suppose S* = S y and suppose the mean change is appreciable, say half of 
S in magnitude, and suppose further that r xy is very nearly equal to r xx 
with a consequent r dd of near zero. How could a mean change based on 
changes so lacking in reliability possess statistical significance? In 
answering this, it should be noted that two things happen as r xy approaches 
y xx as its limit: not only do the change scores become more unreliable 
but also the standard error of the difference (= standard error of the 
mean change) is progressively reduced by the increasing r xy in the standard 
error of the difference between initial and final means. Such an occurrence 
—very high r xy and a substantial mean change—is highly academic because 
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produced changes are usually different from person to person rather than 
nearly constant over persons as implied by a very high r xv . 

The third situation for which we may need and in fact should have 
information regarding the reliability of difference scores occurs when for 
each of several persons we have a difference between scores on two different 
tests, or variables. For such difference scores to have meaning, the two 
sets of scores should be in comparable units such as standard scores or 

Wlth 6qUal means and Ss • Under these conditions the reliability 
of difference scores will be given by (10.24) with the subscripts referring 
to the two different variables, or tests. This time it is high correlation 
between the two variables that tends to lead to unreliable difference 
scores even though r xx and r„ are satisfactorily high. Again, the unreli¬ 
ability will limit the degree of correlation of such difference scores with 
other differences or with other variables. The instability of difference 
scores would seem to limit their possible usefulness in diagnostic work or 
m guidance programs. But it should be noted that even though difference 
scores may not provide a very reliable basis for differentiating among 
individuals, a difference for a particular individual may be dependable 
provided it is sufficiently large, say 1.96 S v for which 5) would be 
obtained as the square root of the middle or right-hand part of (10.21), 
with the components in standard score form. Needless to say, even large 
differences may not have diagnostic or guidance significance—empirical 
study is needed to demonstrate their value. 

Measurement errors and a regression phenomenon. Suppose we have 
the scatterplot for the scores on two comparable forms of a test for N 
persons. With a form versus form correlation, or reliability coefficient, of, 
say 85 the regression line for the second form score on the first score (or 
the line one might use to predict JT a from *i) will have a slope of .85 
(assumes S x - S 2 ), which means, of course, that those initially below 
average and those initially above average have “regressed” toward the 
mean on the second testing. That is, there would seem to be a tendency for 
the initially low to gain and the initially high to lose, which implies a 
negative correlation between initial score and gain. 

Let’s take an algebraic look at the situation. 

Let X t = initial score 

X t == final score 

Then V = V - V = x f - x t = x g = gain 

r = = Xxfa - Xi ) = 'ZxjX, - 2a; 3 ,. 

Wt S « NS ( S, m t S, 

r Ul = r if S ‘ S t ~ s \ _ r if S, — S, 

S ’ S: VS 2 , +-S 2 , - 2 r if S t S f 


( 10 . 26 ) 
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[lUj -- 

equal that they cancel out, leaving 

( 10 . 27 ) 


r*f 


- 1 


J2 — 2 r 


if 


between gain and initial score. r xx ’ h oi level if N is in excess 

the initially high. nn? ^ t ; n f 0 n 0 w-up studies involving 

=SS5=SS 

This neg.,.-. —=on„ib„f. » «*1 

unreliability on r if and in part y concluding that gain and 

differential changes from lnlt '^‘°^ n ^ t rid 0 f that part caused by 

r“” s ..o Jr , w* 5"“"" m 

deviation units would be a: 2 — ' 12*1 — r xx 1 . 

.Y'.,= V, + l/ ; - To '/, 

with the unknown M 2 taken as equal to M v Then the gain would be taken 

:r:», r, s . r ii rr * 

l u ~ 

s. •»*■ w« 01 * .*•*> - 
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by — A little 


In terms of deviation units we have x 


, he " Ce * e corre lation'"between the a^StS^tial^cores'' 

Vi ’ " d gams from the adjusted initial scores becomes 


= .. S r xxXi (x f - r „x,) 


NS^S, 


vs x - f s e 


r x:r£ x i x f r^S* 2 . 

—- r^ r ifSjS f — rAS^,_ 

r„S i 'Js 2 f + r 2 xx S\ - 2r { , rm! s t s 


JjifSf V X xSj 


ft * n f* ' w ■> t (sU> (, 

y/s 2 + r* s 2 - 2r r S <?" ( 10 - 28 ) 

f i Zr if r xx^jb f 

ga^SS»a =5 

taco™" s a* »„s j s ,r rel, ; io " b ’ n ™ B *” » d ““ 

to same condi,£” " ,f “ ,U ‘ 1 f™ ^ 00.16) to 

Another approach to the elimination of the effect of err™ „f 

w 5 f in d is t0 wor J with 

vve can wnte a prediction equation as 


x , = r. 


~‘ x - = + e) _ Z x\ + Zx t e 


NS t S x 


NS t S x 


(10.29) 


As earlier, it is assumed that *, and e are uncorrelated, so we have 


~ which from (10.14) = ^ 


Also from (10.14) we have S f = S V7~ 
Substituting in (10.29), * 


x t — \/v„„ 


% = r„„x 


(10.30) 
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Note that the estimate, x'„ is identical to ■>', estimated from x 1 or to x f 
estimated from both *' 2 and af, being estimates under the condign of 
no changes except those due to errors of measurement We see now that in 
saying that or x' f may be regarded as a sort of adjusted value for Uc 
first or initial score, we are in effect so adjusting the initial score as 
yield an estimate of the true score. The x' t values are referred to as 
regressed scores. Obviously, the utilization of regressed initial scoresi as a 
basis for determining whether or not gams are correlated with initial 
(regressed) scores will lead to (10.28). Conceptually, it may seemprefer¬ 
able to think in terms of as an adjusted initial score that makes allow¬ 
ance for the error of measurement part of the regression of final on initial 

^Tmatched group fallacy. Occasionally, in the comparison of group 
changes, either long-term or experimentally produced, we may be dealing 
with samples from two populations that are known to differ appreciab y 
that is, random sampling will yield groups that will differ m initial score 
level In order to have groups with comparable initial standing, cases ar 
paired a procedure which is ordinarily desirable but which for this given 
situation introduces a difficulty in that the matching will involvepairing 
some persons from the top half of one population with persons from, the 
bottom half of the other population. Upon subsequent testing and without 
any interpolated change-producing experience, one group will show § a “ 

«hU» 5 * other *.«p will . 1 , 0 . loss. b«. .hi. 

represents nothing more than the regression of scores in each f ° P 
toward the respective mean of the larger group from which it was drawm 
And of course, with change-producing conditions this type of regression 
due to errors of measurement will contaminate (either increase or decrease) 
any real difference in change between the two groups. , , 

A second type of situation in which matching may be disrupted by 
measurement errors occurs when pairing is used to obtain two groupsit a 
are comparable on some variable, Y, which should be contro led in making 
a comparison on variable X. If the matching on Y involves pa>™g a 
person from the upper-half of one supply group with a person from the 
lower-half of the other supply group, the two matched samples sp set up 
will tend to have equal means for Y, but will nevertheless differ on the 
Y variable because of the errors in the Y scores used in pairing, 
immediate retest on Y will show that those from the top part of one 
“population” (supply) will average lower on 7, whereas those from * 
bottom portion of the other supply will average higher on Y, than 

0r For a ehher of these situations-control on initial score or on another 
variable—the failure to obtain really comparable groups will occur even 
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though no* all pairs involve top-half versus bottom-half status The 
arger the percentage of such pairs, the greater the disruption. For both 

scores Separate regression equations of the form x' t = r x . for v' 
= r y) will be needed for each group from which the (matched)' sampL* 
are to be formed, with the value of (or r„) taken as the reliability most 
appropriate for the particular group. The calculation of the regressed 
jore,, fate,,, by c , s ,i 0g the „ graslon . no r>w 

a t - r xx x + M x — r..M x . 


INDEX CORRELATION 

A possible source of error in correlational work may be introduced when 

a^Z and 20mm0n , Variable denominator are correlated, such 

X/Z and YjZ. Before considering this special case, it might be well to 
rn our attention to more general formulas for indexes. These formulas 
nvolve the coefficient of variation, namely, v = S/M, and their use leads 
to serious error when the as are large-a 3 and higher-power terms having 
been dropped in the derivations. 8 

Let: / = XJX-y, then it can be shown that the mean and standard 
deviation of such an index or ratio will be approximately 


M i = ~d 


r l2 V l V 2 + V 2 2 ) 


O IV/ 1 / n ■ -*—* 

1 ~~ 1 - ^ r l2 V l V 2 + V% 


(10.31) 


(10.32) 


inde f xIs e imvield Ur a Vari !i bleS ’ ““ f0U ° Wing formuIa {m the correlation of 
indexes will yield a good approximation: 


Zo = 


_ — r-iAV-iV 


V v\ + v\ 


r 14 v 1^4 r 2 Z V 2 V Z + ^34^3^4 
^ r i3 V l V S^/ v2 2 + v\ — ^ r 2i V 2 V 4 


(10.33) 


th^ 1 ? 0118 ^/ 11686 f ° rmUlaS are V6ry usefu1 ’ their use is somewhat limited in 
that generally we cannot know whether the index distribution is normal 

foTthT We Tl 6 a , Statement concerning linearity and homoscedasticity 
for he correlation between two indexes. Such information if needed 
must be obtained by first determining the numerical value of the indexes for 
each individual and then making distributions. 

everal special cases can be deduced from formula (10 33) Thus the 

xlr ^xT Qn and is exactly 

£r if* X 2 /l , i.e., is set equal to 1, which makes i; 4 = 0 and there- 
fore all terms involving the subscript 4 vanish. The coiTelati^ belween 
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X^X 3 and the reciprocal of a variable would be obtained by setting X 2 = 1, 
i.e., letting 1 /Z 4 be the reciprocal; then v 2 = 0 , whence the desired formula 
can be obtained by dropping all terms involving v 2 . Likewise the correla¬ 
tion can be deduced for 1 jX 3 with 1/X 4 , for 1 jX 3 with X 2 , and for XjX 3 
with X 2 jX 3 , This last correlation is of particular interest because it is 
possible to find a relationship between these two indexes even though the 
three original variables are uncorrelated. 

By substituting X 3 for X 4 , i.e., replacing subscript 4 by 3, an expression 
for the correlation of indexes having a common variable denominator can 
readily be obtained. It will be 


r Xi Xj, 

w 3 x z 


_ r 12^1^2 ^13^1^3 **23^2^3 T T 3 _ 

V v\ + v\ - 2r 13 v x v 3 \l v 2 2 + v% - 2r 28 v 2 v 3 


(10.34) 


If r 12 = r 13 = r 23 — 0, this becomes 


V v 2 x + v 2 3 \ / v 2 2 + f 2 3 

and if the vs are equal, the value of the index correlation will be .50 even 
though there is no relationship between the original variables. This is 
known as spurious correlation due to indexes. There are instances, how¬ 
ever, in which an analysis of the interrelations of ratios is of just as much 
import as the analysis of the variables from which the indexes are obtained, 
and therefore it does not follow that the correlation between ratios having 
a common denominator is necessarily misleading. 

It has been asserted that the correlation between IQs derived from two 
tests or two forms of the same test will be spuriously high because of the 
common variable denominator, age. It can be shown, however, that such 
a correlation will not be spurious unless the two sets of IQs are correlated 
with age. If the IQ-vs.-age correlations are both positive or both negative, 
the index correlation will be spuriously high; if one is negative and the 
other positive, spuriously low. Thus, rather than make a blanket statement 
to the effect that the correlation between IQs is spuriously high, we should 
say that it can be spuriously high or low or not spurious at all, according 
to the TQ-vs.-age correlations. It should be remembered that, even though 
the IQs based on an ideal (properly constructed and standardized) test 
will be uncorrelated with age, a nonzero relationship might be produced 
for a single school-grade group by the selective factors that operate in 
age-grade location. Within a single grade group in a school system where 
acceleration is permitted, the younger children are likely to be the brighter, 
i.e., have the higher IQs, thus producing negative correlations for sets of 
IQs with age, and consequently a spuriously high correlation between IQs. 
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PART-WHOLE CORRELATION 


Another type of spurious correlation arises when a total score is corre¬ 
lated with a subscore which is a part of the total score. Suppose that a total 
score is made up of three parts, X T = X 1 + X 2 + Z 3 and that we 
correlate X 1 against X T . Ordinarily in such situations the components will 
themselves be correlated positively. It should be obvious that the extent 
to which X 1 correlates with X T is more or less dependent on the fact that 
X T includes X v It does not follow, however, that a high value for r 1T is 
not meaningful, even though spurious. For instance, a high value for r 1T 
would, regardless of spuriousness, justify the use of X 1 in lieu of the 
battery of three subtests. There are times when we may wish to know how 
highly a subtest correlates with a total, based on any number of parts, 
minus the subtest. This correlation is given by 


r l(T-l) 


_ r iT^T — $1 _ 

V S 2 t + S\ - 2r 1T S x S T 


(10.35) 


HETEROGENEITY WITH RESPECT TO A 
THIRD VARIABLE 

We have already discussed the influence on r of heterogeneity with 
regard to one or both the variables being correlated. Suppose variables X x 
and X 2 are two different traits, each of which is related to age as the third 
variable. Then an older individual will tend to be higher on both tests than 
a younger individual. In other words, heterogeneity with respect to age will 
tend to produce correlation between X 1 and X 2 , and our present problem is 
to develop a method for correcting r 12 so that we can estimate what the 
correlation between X x and X 2 would be if age were constant. 

Suppose r 12 , r 13 , r 23 , and the several means and standard deviations are 
known; then let us visualize the three scatter diagrams. The scatter for r 12 
will be somewhat elongated as a result of the influence of age, since 
variation in both X 1 and X 2 are here supposed to be partly due to age 
variation. What is needed is the correlation, between measures of X 1 and 
A 2 , which has been freed from the influence of age. If we were to express 
each X x m the first array of the scatter for r 13 as a deviation from the mean 
of this array and were to do the same for all other X ± s in the scatter-each 
as a deviation from the mean of the array in which it falls—we would have 
scores expressed as deviations from the means of the several ages These 
deviations will be independent of age. As an example, suppose an 8-year- 
old individual scores 28 and the mean of 8-year-olds is 25, and a 14-year- 
old individual scores 54 and the mean of 14-year-olds is 51. The second 
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m 

individual scores higher than the first because he is older, but each wpu d 
have a deviation (from his own age mean) of plus 3 . Obviously, if we dso 
expressed the X 2 scores as deviations from the avera ^ r J** ^ 
ages they too would be independent of age influences. Now, if we corre 
lated these deviations (from age means) we would be correlatingsets of *1 
and X. scores which would be free from age, and hence we would arrive at 
a correlation, between variables X x and X 3 , which would no e a e 

^Partial 1 correlation. The task of determining the correlation between 
two variables, with the influence of a third eliminated, can always be 
accomplished by actually computing all the deviations and then ma 1 g 
a scatter diagram from which the r can be determined. However in those 
cases in which we can assume linearity of regression for X, on X 3 and X* on 
X 3 it is possible to set up a method for determining the desired correlation 
from the three correlation coefficients between the three variables. If 
linearitv exists we can correlate the deviations from the two regression 
lines instead of from the array means (or means for several ages if age is the 
third variable). Since 


x \ — r 13 


and x 


S 2 

2 — r 23 c X i 
^3 


the two sets of deviation-from-regression scores will be 


St 


x\ — x \ — r 13 77 x t 
^3 


and % 2 


x 2 — X 2 


s 2 

r 23 _ Xc c 


The correlation of these deviation scores, which is designatedJr t 
symbol r 123 (read: the correlation between X t and X % with X 3 held con¬ 
stant) and known as the partial correlation coefficient , becomes 


r 12-3 


— x\)(x 2 - x' 2 ) 
NS *'2 


; (*i - r wf * 3 ) (*» ~ *j 


NS 


XV 


c 

■x’l^Xz-X 2 


Multiplying and summing the numerator, and noting that the 5 s in the 
denominator are nothing more than the errors of estimate, and S 2 . 3 , 
we have 

.. „ ^1 




S S 

- r 23 - r u + i" 13^23 

23 S3 S 3 _- 


12-3 


NS,Jl - r\,sJ 1 - r\ 
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Dividing by N, cancelling Ss, and collecting like terms, we get 


Tio — l\o7V 


1 - A3V 1 - A 


(10.36) 


This formula definitely assumes the linearity of the two regression lines 
for predicting X 2 and X 2 from X s . Whether we correlate deviations from 
array means or use formula (10.36), we ertd with a correlation which has 
been freed of the influence of the third, or eliminated, variable. If, for 
example, age is the third variable, the partial correlation coefficient 
represents an estimate of what the correlation would be if we held age 
constant by the use of individuals of any one of the several age levels 
present in the original group. 

The difference between r 12 3 and r 12 indicates how much of the correlation 
between variables 1 and 2 is due to the influence of heterogeneity of a third 
variable. Obviously, if the third variable is unrelated to X 1 and X 2 , the 
partial r will equal r 12 , and if either r 13 or r 23 is negative and r 12 positive, 
partiahng out” X 3 will raise the correlation. Is this reasonable? 

The difficulties encountered in determining the direction of causation 
make it necessary to be careful in the use of the partial correlation tech¬ 
nique. When it is said that heterogeneity with respect to a third variable 
(^ 3 ) has in part (or entirely) produced correlation between X 1 and X 2 , we 
must ask how the influence of X s comes about. Now if it can be argued that 
variation in Z 3 is a cause of variation in X x and X 2 , it is readily seen that 
r 12 is at least in part attributable to the fact that X ± and X 2 have a common 
source of variation. The partial, r 12 . 3 , tells us the degree of correlation 
between X t and X 2 which would exist provided variation in X 3 were 
controlled. But if it cannot be claimed that X 3 produces variation in X, 
and X 2 , the interpretation of the partial r is far from clear. Suppose X 
precedes X 3 in a temporal sense so that we know variation on X 3 couldn’t 
possibly contribute to variation in X v Does it make sense to interpret r 12 ., 
as the correlation between X 1 and X 2 with the influence of X 3 nullified when 
we know that Z 3 could not influence XJ Stated differently, the only, way 
that X 3 can produce or contribute to the correlation between X x and X 2 is 
by way of 2f 3 producing variation in X x and X 2 . 

The technique can be extended for “partialing out” or eliminating more 
than one variable. Thus, to obtain an estimate of r 12 with X 3 and X* held 
constant, we can use 

^12.34 = — ri2 ' 3 ~ 3r2 * 8 

v 1 ~ r 2 u , 3 V 1 - r 2 24 . 3 

which is in terms of first-order partials calculable by formula (10.36). 
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The sampling error of the partial coefficient may be handled by the z 
transformation. The standard error of the corresponding 2 will b e 
1 ry-' .V — 4 when only one variable has been eliminated, and 1/V/V — 5 
when two variables have been eliminated. ;> 

The partial correlation coefficient based on a small sample can also be 
tested for significance by the t technique. If one variable has been elimi- 
nated, we have 


^ 12-3 



with df—N— 3. An additional degree of freedom is lost for each 
additional variable eliminated. 

A perplexing and often-recurring question with regard to the inter¬ 
relations of three variables is this: Are the correlations consistent among 
themselves, or, if r 12 and r 13 are known, what are the possible limits for r 23 ? 
If r 12 = unity and r 13 = unity, r 23 must also equal unity, but, if r 12 - 0 
and r 13 = 0, does it follow that r 23 = 0 ? It c an be shown that the limit s for 

the correlation r 23 wifi always be r 12 r 13 d= Vl 1 12 r 13 "b r iz r 13 * 


Examples : 



are +.62 and + 1.00; 
“ -.50 and + 1.00; 
“ -.875 and + 1.00. 


Part correlation. There are times when we may wish to have the 
correlation between variables X 1 and X 2 with the influence of TV'removed” 
from JF 2 only. For example, we may wish to calculate the cor^ektion 
between intelligence and incidental memory with general memory partialled 
out of the incidental memory; or we may wonder what the correlation is 
between reading ability and academic achievement with the influence of 
college aptitude “taken out” of academic achievement; or we may wish to 
determine the correlation between intelligence and a set of final scores, 
obtained after extensive practice, with initial level partialled out of the 
final variance. 

In symbols, if we seek the correlation between X x and X 2 with A 3 
partialled out of X 2 , we would in effect be correlating X x with the residual, 
X 2 . s . From the derivation of r 12 . 3 it can be easily deduced that an appro¬ 
priate formula is 


= r i2 - r i3 r 23 (10.37) 

Vl - r 2 23 
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which i S referred to as a part correlation coefficient. The part correlation 

useful when * can be argued that ** causes 


SUMMARY 

In this chapter, consideration has been given to factors which have a 
earing on the magnitude of the correlation coefficient. If any of these is 
operative m the case of a particular coefficient, it is the responsibility of the 
investigator to qualify his conclusions accordingly. Published reports of 
correlational studies should include the following P 

° f ‘ he P °P ulation bei "S sampled and a statement of the 
method used in drawing the sample. 

2. The size of the sample and an adequate treatment of sampling by 

means of nonantiquated formulas. P 6 y 

3. The means and particularly the standard deviations of the variables 
being correlated, with some indication as to whether the sahple is typical 

r T Ct t0 the Variables under -nsiderafion 
mining reliabffity ° Cle "‘ S f ° r * he measures and the ™thod of deter- 

5 ’t Sta f ment relative the homogeneity of the sample with respect to 
possibly relevant variables such as age, sex, race. 

tnv., A def T Se ° r P T iSe ' nter P reta tion of any reported correlations 
involving indexes or of any part-whole correlations 

The researcher who is cognizant of the assumptions requisite for a given 

rr,™ r, * n 1 *'™ “ d » h ° » *'» My 

the many factors which may affect its magnitude will not regard the 
correlational technique as an easy road to scientific discovery. § 



Chapter 11 

MULTIPLE CORRELATION 


So far our discussion of correlation has been concerned the 

prediction of one variable from another or the attributing of a porturn of 
the variance of one variable to the action of a second variable. We shall 
next consider the case where it is desired to predict one variable by using 
several other variables as a team of predictors, or where, if causation can 
be assumed an attempt is made to analyze the variance for one variable 
Sto components or parts attributable to the action of two or more other 
variables^ There is a close connection between the predicting and the 
Tua^ing problems; let us first consider the method of predicting one 
_•1.1__ ti,n Uodc nf variables. 


THE THREE-VARIABLE PROBLEM 

For simplicity, consider the problem of predicting X x from a knowledge 
of X. and X 3 . The X x variable is frequently called the criterion, or ^ e P e "' 
dent variable. If we had X x to be predicted from alone, we w0U W 
exactly the same situation as predicting Y from X. That is, the 1 
prediction equation (in gross score form) 


becomes 

and the deviation form 


X\ — + A 

y’ = bx + a 


becomes 


x\ = bx 2 + a 


It will be recalled that the values of the constants, B and A, or b and a 

169 






170 PSYCHOLOGICAL STATISTICS 

S2S 1 " ‘° « ive ‘ he maximum predictability, and that 5 and 

variables I d o f Jh ^ a ? rrelation coefficie "‘ between the two 

variables and of the means and standard deviations for the variables The 

equation winch resulted from giving A and 5 specific values was “ d to be 

•ST"* ***** *-*> «« of p, .die, fa, w .IZ tTJ/ 

of fte form ' P * X ' **’ m ”“ h “ elation 

X ! = B 2 X 2 + B 3 X 3 + A (11.1) 

which can be written in deviation units as 


x 'i = b 2 x 2 + b z x z + a 

fhafi -*iTndT S TTI the ecpiation of a P‘ ane - It can be shown 
*7 ~ 2 r d u 3 = b3 ' In fact ’ this is rather obvious when we consider 

In? T ? t I 68 ® B ? b C ° effidentS - The > re P resent the slope of the 
plane, B 2 is the slope which the plane makes with the x 2 axis and B the 

slope with regard to the , 3 axis. When we shift from raw to devf a 3 , on 
section ZfT "ft * "<* ° r P^t of reference, to the " 

zero ThrithifTTtn ? P° intin terms of ' deviation «ores becomes 
anrie Jf the n.! ? reference d ° eS " 0t chan S e the P°«‘i°n or 

tha S t fof th P H h6nCe 2 = h and Bs = b *• (The student will recall 

to 5 or T) y tW °‘ Variable P roblem - ‘he slope of the line was equal 

It remains to attach meaning to A and a. In the equation Y’ = BX + A 
it was noted that the constant was the Y intercept, i.e., the value If Y 
where the line cut the y axis. It was also found that a = 0; i e that in the 
deviation form the line cut the , axis at the origin. PerhapMeltudlnt 
has already anticipated, by analogy, that the A in our three-variable 

value o°f 18 m e H V ° f Xl WhCre the P ‘ ane CUtS the axis > an d that the 
value of a will become zero. 

Before going farther, it might be well to take a look at the problem 
geometrically. In the case of two variables, after plotting the Xand Y 

Md^?nV^ tteiBram ’ We can readiIy piCtUre the meanin g of B and A, 
better Zed T "T" ° f Why C6rtain Values ofB A will lead to 
thr IvariatSs ? fv by ° ther Va ’ Ues ' In the case of 

tnree variables, X lt X 2 , and X 3 , we have a trio instead of a pair of measure- 

leed to ise°ath r ^ “? a pl0t ° fNsw * sets measurements, we will 

need to use a three-dimensional scheme. Instead of placing a tally mark in 

* ” ”t«™l alon 8 «* - and one along , ”* " 

“ "'«> ^ imervals on the ,X ,„dl 

* 3 axes. Instead of a square cell, we have a cubical cell. 
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Suppose an individual’s three scores fall in intervals q, i 2 , and / 3 ; then 
his “tally” will be placed in the cubicle formed at the intersection of these 
three intervals. The total number of cubicles will be the product of the 
number of intervals on each axis, and an individual’s location in the box 
will depend on all three of his scores. The student may be at a loss to know 
just how he could make such a three-dimensional scattergram. Actually, 
this diagram is not necessary, but it is of interest to imagine what such a 
three-way distribution would look like. If the correlations, r 12 , r 13 , and r 23 , 
are fairly high (and positive), and if we think of the frequencies in the 
several cubicles as being represented by dots (or different degrees of 
density), then the swarm of dots will extend from the lower left front to the 
upper right back of the box. The greatest density will be at the center of 
this swarm, and the density or frequency will fall off in all directions from 
the center. The swarm will have the general shape and appearance of a 
watermelon (ellipsoidal). 

Imagine that a plane is to be cut through this swarm. Our job is to so 
locate the plane that, when we start upward vertically from any point on 
the bottom of the box, say the spot defined by any pair of values for X 2 and 
X 3 , we will find that the altitude, i.e., the distance along the axis at which 
the plane is reached, will constitute the best estimate of X 1 for individuals 
having any given X 2 and X 3 scores. With a little reflection, the reader can 
see that, of many ways of placing the plane, some positions will obviously 
give very poor estimates, whereas others will lead to better estimates. 
What we need is that plane which for the given N sets of X v X 2 , and X z 
scores will yield the best possible estimates. 

The criterion of “best” is a least square affair—the sum of the squares 
of the errors of estimate shall be a minimum. The task is really that of 
determining the values of A, B 2 , and B s in formula (11.1) so that 


S(*i - X\f 


is a minimum. That is, we are to assign to A , B 2i and B z those values which 
will permit the best possible estimate of an unknown X x when we know the 
X 2 and X 3 values for the individual. The principle to be used is exactly 
the same as that employed to obtain the optimum value for B and A for 
the two-variable problem, but the present problem is more complicated 
because we have to determine the values for three constants. 

Derivation of regression equations. Our task is simplified if deviation 
scores are used, and we assume a = 0 (if we carried a along, it would prove 
to be zero). It is simplified somewhat more if we transform all three sets 
of scores into standard score form, i.e., if we set z = (X — M)IS. Then 
our equation becomes 

Z 'l = @2 Z 2 “b p3, z 3 (11-2) 
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siould be noted that, since we are changing the size of our unit of 
measure, it cannot be argued that ft will equal ft or ft. The task now is to 
e ermine the value of the beta coefficients, ft and ft, so as to have the 
best possible estimate of Zj, or so that the average of the squared errors, or 

t S o ha be minimtedT' ^ % " * = * " ^ /, 

f ~ ^ ~ @2 Z 2 ~ A 2 3) 2 

The calculus is used to determine the values of ft and ft which will make 
is function a minimum. We take the partial derivative of the function 
first with respect to ft, then with respect to ft. Thus, 


Sf_ = — 

3ft 


N 


Sf_ = -2Sz, 

% 


N 


(h ~ iVa - ft« 3 ) 


( Z 1 ~ - ftz 3 ) 


These two derivatives are to be set equal to zero and then solved simul¬ 
taneously for the two unknowns, ft and ft. Performing the indicated 
multiplications, summing, and dividing each equation by 2, we get 

Z l z 2 , o ^ z \ . n %Z 2 Z 3 


+/? 2 ^ + /5 3 — 
N N N 


~^ z i z 3 

N 


4- ft | n £z 2 3 n 

+ Ps n +h ir 0 


Since we are dealing with standard scores, we can now capitalize on 
certain properties thereof, namely, that the sum of their squares divided by 
N is unity, whereas any sum of cross products divided by N is the correla¬ 
te the two variables involved in the cross products. Thus, we 

~ r n + A + ft'au = 0 

“'is + /V23 + ft = 0 

ft 2 “b r 2s ft r 12 = 0 
''•ah + ft — r 13 = 0 


or 
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Since the rs in the equations are determinable for any given sample of data, 
they are in effect knowns, whereas the /3s are unknowns. We therefore have 
two simultaneous equations with two unknowns. These can readily be 
solved by a number of methods which the student will find in an algebra 
textbook. Straightforward solution gives 

r> I* 12 ^*13^*23 

P2 “ 1 2 


ft 


7*13 r 12 r 23 

1 - r 2 23 


As soon as we have computed the rs, we can easily determine the /3s. 
The obtained numerical values can then be substituted in the prediction 


equation 


2 ]_ — /3 2 2 2 T" /^3^3 


so that for a given pair of 2 2 and % values we can predict the standard 
score on the criterion variable. However, in practice it is ordinarily more 
convenient to deal with raw scores; hence we need our prediction equation 
in raw score form. Obviously, if we replace the 2 s in the preceding equation 
by their values in terms of raw scores, means, and standard deviations, we 

will have 

X\-M 1== + 


or 


Si 

X\ M 
\ ' 


S, 


1 _ 


ft 


X. 


M 


X, a M z 


ft^ + ft^-/ 5 


Multiplying by 5, and rearranging terms, we have 


X' x = ft h x 2 + ft S X 3 + k - ft | M. - ft | M») (11.3) 

So So ' ^2 3 


from which we see that our original B 2 must equal/? 2 (5' 1 /S' 2 ), B z — /^l l s z)> 
and A = the parentheses term. Thus we can readily determine the 
numerical values of B 2 , B z , and A and thereby have the constants for the 
prediction equation. Actually, the values of B 2 and B z are the optimum 
weights to be assigned to X 2 and X 3 in order to predict X v 

Error of estimate. The accuracy of the prediction of X x by the best 
combination of Z 2 and A 3 can be ascertained by examining the error term 

x - X\or The sum of the squares for the errors divided 

by N will yield the variance of the errors. The square root would 
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correspond to the standard error of estimate. Let S z be this error (in 2 
units), then 1>23 


S* . = 


z l-23 


- ~ ^l) 2 


N 


N 

= M + F 2 + £2 ^ _ 20&A _ 2 ftSsA 2AAS** 

* N N N N + -~ 

= 1 + /?\ + /? 2 3 - 2fl. 2 r 12 - 2/Vis + 2^/? 3 r 23 

which by algebraic manipulation reduces to 

*^ 2 ^1‘23 ~ ^ ~ (/Vl2 + /5 3 r 13 ) 


(11.4) 

m terms of standard scores. Then S\ times this would give the error 
variance for raw scores. 

Multiple r. We next define the multiple correlation coefficient as 
the correlation between ^ and the best estimate of ^ from a knowledge of 
z 2 and z 3 . In symbols, 6 

_ ^>z 1 z\ 


— r. 


NS n S^ 

^ Z l(p2 z 2 + ft 3 g 3 ) 

ivsT 


(11.5) 


Note that, although S gi = 1, it does not follow that S g , =1. In order to 
evaluate this last S, we write 1 


z i — z 1 ~h 


23 


That is, we think of % as being made up of two parts—that which we can 
estimate plus a residual. It can easily be shown that these two parts are 
independent of each other; hence by the variance theorem we have 


S 2 . 


S\. + S 2 


or 

then 


Z 1’23 


1 = S 2 ,, + S 2 


Z 1‘23 


S 2 , = l-s 2 . 


But s \-z, is nothin g more than the variance of the prediction errors as 
given by (11.4); therefore 


~ ^ P2 1 '12 + /V13 
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Then, by substituting in formula (11.5), we have 

r = + fe) 

123 a\ ,.v - 12 + , 

= _ /?2 r 12 + /?3 r 13 

N-\!fi2 r 12 "b /?3 r 13 2 r 12 + Z^3 r 13 

=: V/^2 r i2 + AfflS 
It can also be shown that 


r 2 12 + r 2 


2r 12 r 13 ro, 


We thus see that, as soon as the /5s are determined, we can write the 
regression equation for predicting z 1 from s 2 and z 3 and can also specify the 
degree of correlation and calculate the error of estimate. This error 
obviously can be written from formulas (11.4) and (11.6) as 

5l-23 = (11.7) 

which is in terms of raw scores. 

Formula (11.7) has been used frequently to define the multiple correla¬ 
tion coefficient. Stated explicitly, 

o2 

r 2 — 1 _ d 1-23 _ 1 _ 02 
1 1-23 — 1 1 3 *1.23 

Then, by substituting from (11.4), we again arrive at (11.6). 

The student will note the similarity of formula (11.7) to the ordinary 
error of estimate for the bivariate situation. Thus the multiple correlation 
coefficient can be interpreted, in terms of reduction in the error of estimate, 
in exactly the same manner as the ordinary bivariate correlation coefficient. 
The only difference is that we are now determining the regression coeffi¬ 
cients, or weights for two variables as a team, so as to get the best possible 
prediction of a third variable, whereas in the bivariate situation only one 
regression coefficient is necessary. A multiple correlation coefficient of .60 
has, aside from minor qualifications to be discussed later, the same meaning 
in a predictive sense as an ordinary correlation of .60. Furthermore, the 
interpretation in terms of contribution to variance also holds for the 
multiple correlation coefficient; i.e., if causation can be assumed, it may 
be said that a multiple r of .60 indicates that 36 per cent of the variance in 
the criterion or dependent variable can be attributed to variation in the 
two independent variables. 

Relative weights. The question arises as to the relative importance 
of the two variables as contributors to variation in the criterion variable. 
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The B coefficients in the regression equation have, at times, been misinter¬ 
preted as indicating the relative contribution of the two independent 
variables. The reader need only be reminded that the two B coefficients 
usually involve different units of measurement (one may be in terms of feet 
and the other in pounds); hence they are not comparable at all. If B 2 is 
numerically twice B 3 , it does not follow that X 2 is twice as important as X 3 . 
In order to get around this difficulty, we must think in terms of standard 
scores; these will be comparable, and hence the ft coefficients in the 
standard score form of the regression equation will be comparable. 

Since 

S* x = S 2 Z , + S 2 Z 
or 

1 = S 2 . + S 2 


1 - 


it follows that 


= S 2 


That is, t \ 23 , which corresponds to the percentage of variance explained, 
is equal to S 2 Z >, or the variance of the predicted standard scores. This 
variance could be determined by actually making N predictions of z x 
from the N pairs of values of z 2 and z 3 and then computing the S for the 
distribution of these predicted values. This is not done in practice, since 
the value of this S squared is r 2 1 . 23 , which is easily calculated once the ($s 
have been determined. 

But note that, since , n n 

Z ! = fa + (S 3 Z 3 

we can indicate the value of S 2 „, as 


S%, = 


_ s (z\f Ufa + faf 


which becomes 


= 2 + ft 3 Sg 3 + 2fi 2 [i 3 l<Z 2 Z 3 

N 

s V = /? 2 2 + /? 2 3 + 2 far a 


In other words, the predicted variance, which corresponds to the 
“explained” variance, can be broken down into three additive components. 
We thus see that the relative importance of the variables X 2 and X 3 in 
“explaining” or “causing” variation in X 1 can be judged by the magnitude 
of the squares of the p coefficients. The third term in formula (11.8) 
represents a joint contribution which, it will be seen, is a function of the 
amount of correlation between the two predicting variables. 
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In summary, it can be said that the fundamental problem in multiple 
correlation is that of obtaining the optimum weighting to be assigned to 
independent variables (X 2 and X 3 ) in predicting or explaining variation in a 
dependent variable, X v That is, we determine the value of B 2 , B 3 , and A 
in the equation 

X\ = B 2 X 2 + B s X 2 + A 


so as to get the best possible estimate of X v This is resolved by working 
with the prediction equation in standard score form with /3 coefficients. 
The value of each is determinable from the intercorrelations among the 
three variables. Once the /3s are calculated, we can: (1) readily compute 
the B coefficients needed in the raw score form of the prediction equation; 
(2) determine the value of the multiple correlation coefficient and the crroi 
of estimate; (3) ascertain the relative importance of the independent 
variables as predictors or, if causation can be assumed, as contributors to 
the variance of the dependent or criterion variable. It is important to note 
that the multiple correlation coefficient represents the maximum correlation 
to be expected between the dependent variable and a linearly additive 
combination of X 2 and X z . 


MORE THAN THREE VARIABLES 

Suppose that we have a dependent variable and four independent 
variables which might be used as predictors or which might be thought of 
as causes of variation in the dependent variable. The cause and effect, as 
opposed to concomitant, relationship among variables is a logical problem 
which must be faced by the investigator as a logician rather than as a 
statistician. Whether we resort to the multiple correlation technique as an 
aid in predicting or as an aid in analysis will depend entirely on the problem 
being attacked; the mechanical solution is the same, but the investigator 
must choose the interpretation which best suits his purpose. 

For a five-variable problem, we need the constants in the regression or 

prediction equation, 

X\ = B 2 X 2 + B s X 3 + BA + B b X h + A 

which can be written in standard score form as 

z\ = fi 2 Z 2 + /3 3 % + /3 4 s 4 + 5 

As in the three-variable situation, the problem is that of determining the 
optimum values of the Bs or the /3s so as to get the best possible prediction 
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of X 1 or z l9 i.e., so that 


or 


S(*i ~ ^'i) 2 
N 


s ( z i ~ z 'if 
N 


shall be as small as possible. The mathematical solution is easier by way 
of the standard score form of the regression equation. We have the 
function 


f = ^(% Z l) 2 _ — 02% 

N ~ 


'•3% ~ 04% — 05%) 2 

N 


(11.9) 


which is to be minimized by assigning proper values to the 0s. These 
values are obtained by taking the derivative of the function with respect to, 
and in order for, each of the /Is. This will yield four derivatives which 
when set equal to zero will give us four equations involving the four 
unknown /5s. These equations can then be solved as simultaneous 
equations in order to determine the values of the /5s. The obtained /5s will 
be such that the sum of the squares of % - z\ will be the least possible; i.e., 
we will have the best possible estimate of z t from an additive combination 
of the four independent variables. 

The student of the calculus can readily verify that the four equations 
obtained by taking derivatives of formula (11.9) will take the following 
form (when set equal to zero): 

02 + 03 %s + /V24 + /V25 ~ r 12 = 0 

/V23 + 03 + 04 %4 + 05%5 ~ r 13 = 0 

02%4 + 03 %4 + 04 + 05*45 “ r i4 = 0 I ( 

02 r 25 + 03 r 35 + 04 r 45 + 05 ~ r l5 = 0 . 

These equations result from steps exactly parallel to those used for the 
three-variable problem. The four /5s are unknowns, whereas, for any given 
batch of data, the rs take on specific numerical values. 

The extension of multiple correlation to include any number of variables 
involves the same principles as utilized here for the three- and the live- 
variable problem. For n variables, formula (11.2) becomes 


Z 1 — 02% + 03% + * * ’ + fin Z n (11.11) 

The extension of (11.3) as the gross score equation should be obvious. 
Formula (11.6) for the multiple correlation coefficient becomes 


r i -23 • • •« — 3/ 0 2 r 12 + 0 3 r 13 + • • • + 0 n r ln 


(11.12) 
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To solve for the unknown f is, the student may resort to any of the 
schemes given in algebra textbooks for solving simultaneous equations. 
One method is by way of determinants and Cramer’srule. The coefficients 
of the unknowns are the intercorrelations among the four independent 
variables, whereas the constants in these equations are the respective 
correlations of the dependent with the independent variables. In the 
application of Cramer’s rule, these constants are thought of as being on the 
right-hand side of the equation, i.e., shifted to the right of the equality 
mark, with the consequent change of sign. The student should keep in 
mind, however, the fact that the original sign of any of the computed 
correlation coefficients must be considered. 

Solution by Cramer’s rule becomes quite tedious and burdensome for a 
problem involving more than four or five variables. Indeed, this deter- 
minantal solution is practically impossible for problems involving a large 
number of variables. Fortunately, there is available a simplified solution, 
but before turning to it, we would like to indicate some algebraic manip- 
ulations in terms of determinants. 

It w m be noted from the foregoing simultaneous equations that all the 
intercorrelations among the five variables are involved. 1 hese correlations 
can be conveniently arranged in a table, or in determmantal form. Thus 
we can define a major determinant as 


1 

^12 

fia 

r u 

^15 

^12 

1 

^23 

r 24 

r 25 

r 13 

r 23 

1 

r 34 

^35 

^14 

1*24 

r 3 i 

1 

r 45 

^15 

r 25 

r 3 5 

^45 

1 


If we were to delete the first row and first column, the minor which 
remains would involve the intercorrelations among the four independent 
variables. This minor might be conveniently symbolized as D 13 ; i.e., we 
have deleted the column and the row which involve the subscript 1. If 
we were to delete the row which involves the subscript 1 and the column 
involving the subscript 2 throughout, we would symbolize the resulting 

minor as D 12 . 

Now it can be shown that 


U 11 


or any /3, say j3, p , will be 
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where the quantity (—l) v is an indicator of either a positive or a negative 
sign, but the ultimate sign of is also dependent on whether the numerical 
values of the determinants are positive or negative. It can also be shown 
that the multiple correlation coefficient can be written as a function of 
determinants, thus 

r 2 - i D 
' 1-2345 — 1 - 

Ali 

The student who is interested in following a treatment of multiple 
correlation in terms of determinants is referred to T. L. Kelley’s Statistical 
method . * 


NUMERICAL SOLUTION 

On a desk calculator, the solution of the simultaneous equations for 
the unknown 0s can best be accomplished by resort to Doolittle’s method. 
This method is applicable to the solution of any simultaneous equations 
involving a major determinant which, like D, is symmetrical about the 
diagonal. It is also applicable to problems involving less or more than 
five variables. The first step is to write down the intercorrelations 

Table 11.1. Schema for arranging rs for Doolittle solution 



*3 

x t 

x s 


1 

^23 

V 24 

'*25 

“'12 


1 

hi 

'35 

“''13 



1 

r 45 

~'14 




1 

“''is 


(coefficients of the unknown 0s) in the form indicated in Table 11.1, in 
which the right-hand column contains the correlation of each variable with 
the criterion or dependent variable. Negative signs are attached to these 
coefficients because, in essence, we are dealing with equations (11.10). 
Obviously, if the original sign of an r were negative, it would be preceded 
by a plus sign in an arrangement like that in Table 11.1. 

As a numerical example, we shall use data from the Minnesota study of 
mechanical ability, f The sample size is 100. 

* Kelley, T. L., Statistical method , New York: Macmillan, 1924. 

f Paterson, D. G., et al., Minnesota mechanical ability tests, Minneapolis: University 
of Minnesota Press, 1930. J 
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X x = Criterion (mechanical performance-quality). 

X 2 = Minnesota assembling test. 

X 3 — Minnesota spatial relations test. 

X 4 — Paper form board. 

X 5 = Interest analysis blank. 

Since the several means and standard deviations will be needed, these are 
recorded in Table 11.2. 

Table 11.2. Means and Ss (Minnesota data) 

X 1 X 2 X ?i X 4 X 5 

M 14.94 127.56 1422.90 46.60 107.00 

5 2.09 25.32 296.39 19.45 18.00 


Table 11.3 gives the Doolittle solution for the coefficients. Once these 
are known, the regression equation in raw score form can be. written, and 
the multiple r and the error of estimate can be determined. The table 
includes an indication of the calculation of these values. The student will 
have to study the schema of the Doolittle solution carefully in order to 
grasp the necessary steps. We shall not attempt a complete exposition of 
the steps since the procedure of each step is indicated in the left-hand side 
of the table. A few remarks, however, will be of aid to the student. 

As already specified, the correlations are written down in an order 
corresponding to equations (11.10) except that values to the left and below 
the diagonal are omitted. The first thing we do is to set up a check column. 
The first entry, 1.92, is obtained by summing, algebraically, the first row of 
correlations (including the diagonal 1.00); the second figure, 2.12, is the 
sum of the second row plus .56; the third entry, 1.99, is the sum of the 
third row plus .49 and .63; and the 1.63 is the sum of the fourth row plus 
.42, .46, and .39. The rule being followed should now be obvious: theyth 
entry in the check column is obtained by summing the 1.00 in the yth row 
with the values above it and to its right. The student should satisfy himself 
that this is equivalent to summing the correlations for the respective 
equations in (11.10). Since the check column will provide, at intervals, an 
automatic check on our computations, this summing should be done at 
least twice to insure accuracy. 

Line 1 of the solution is obtained by copying down line a, the first row 
of rs; and line 2 consists of the line 1 values with the signs changed. The 
second part of the solution begins with line 3, which is obtained by 
copying down the b row of correlations. Line 4 is obtained by multi¬ 
plying entries in line 1 by —.56, which figure is found in line 2 directly 
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Table 11.3. Computation of multiple r 


*2 

2 T 3 

^4 

*5 


ck 

(a) 1.00 

.56 

.49 

.42 - 

-.55 

1.92 

(b) 

1.00 

.63 

.46 - 

-.53 

2.12 

(c) 


1.00 

.39 - 

-.52 

1.99 

(d) 



1.00 - 

-.64 

1.63 

(1): line (a) 1.00 

.56 

.49 

.42 - 

-.55 

1.92 

(2) -1.00 - 

-.56 

-.49 

-.42 

.55 

-1.92 

(3): line (b) 

1.000 

.63 

.46 - 

-.53 

2.12 

(4): (1)( — .56) 

-.314 

-.274 

-.235 

.308 

-1.075 

(5): (3) +(4) 

.686 

.356 

.225 - 

-.222 

1.045 ck 

(6): (5)( —1/.686) 

1.000 

-.519 

-.328 

.324 

-1.524 ck 

(7): line (c) 


1.000 

.39 - 

-.52 

1.99 

(8): (IX—-49) 


-.240 

-.206 

.270 

-.941 

(9): (5)(—.519) 


-.185 

-.117 

.115 

-.542 

(10): (7) +(8) +(9) 


.575 

.067 - 

-.135 

.507 ck 

(11): (10)( —1/.575) 


-1.000 

-.116 

.235 

-.882 ck 

(12): line (d) 



1.000 - 

-.64 

1.63 

(13): (1)( —.42) 



-.176 

.231 

-.806 

(14): (5X-.328) 



-.074 

.073 

-.343 

(15): (10X-.116) 



-.008 

.016 

-.059 

(16): (12) +(13) +(14) +(15) 



.742 - 

-.320 

.422 ck 

(17): (16)( —1/.742) 


- 

-1.000 

.431 

-.569 ck 

Back solution 

From (17) 




.431 = 

/^5 

From (11) 


(.431X 

-.116) + 

.235 = 

j?4 = -185 

From (6) (.185)( 

-.519) + (.431)( 

-.328) + 

.324 = 

h = .087 

From (2) (.087)(-.56) + (.185)( 

-.49) 

+ (.431)( 

-.42) + .55 = 

A, = .230 

Final checks 






(.230X1.00) + (,087)( .56) + (,185)( 

.49) + (.431)( .42) ■ 

- .55 = 

.000 

(.230)( .56) + (.087X1.00) + (.185)( 

.63) + (.431)( .46) - 

- .53 = 

.001 

(.230)( .49) + (,087)( .63) + (.185)(1.00) + (.431)( .39) - 

- .52 = 

.001 

(,230)( .42) + (,087)( .46) + (,185)( 

.39) + (,431)(1.00) - 

- .64 = 

.000 





1 

Jjj-j multiple correlation 

above the 1.000 of line 3. As indicated at the left line 5 results from sum- 

S,“S consistent with «„»< imposed b, ^ 

decimal places. Acceptable discrepancies will be of the order ±.001, 

4- 002 • * * ±-005, seldom larger. . * f ltc 

Line’ 6 is obtained by multiplying line 5 by the negative reciprocal of it 
first entry. The correctness of the reciprocal used is evidenced by the fact 
S when multiplied by .686, unity results. The ck attached to - .524 
nd caTes that summing the entries in line 6 yields the same value a> 1.045 
multiplied by the negative reciprocal of .686, thus provi ing a 
check This completes the second part of the solution. 

?hc ,5 part begins with • copying of row , of the cor,cl. » uNm 
The student should now be able to follow the steps; in particular he 

should note that a multiplier is secured from the i. 1 “ t . ll “ ° f ^vahS'in 
nart of the solution; that each multiplier is applied m turn to the values m 

thfimejust above it; that, when .1. s»ch —iplier, have been J** 

p ill he as many parts to the solution as there are independent variables. 
The last part always consists of three columns of 

figure in the middle column is the value for 0 n . In our example p n 
Toother fls are determined by a “back” solution, which always involves 

a 52X 

livenTnSe l l 3 As L Anal check on all the computations, the four Js 
obtained must be substituted into the four simultaneous equat.ons wi 

W t h oSX»t^ rtutsXrseM form'we ordinarily require^the 

■ uuc Tn rrf-t the multiple correlation coefficient, the p s ana appiu 
variables. To get me muiupic the 

printers are substituted in formula (11.12), and trom tn. 

tFor more than five or six variables, the computations are more econom.cally 
accomplished by electronic computers. 
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JSJ,™ a PP ro P riate forjudging the accuracy of predictions made 
values regreSS1 ° n ec l uation - Table 11.3 includes these additional 

to^un^h" 11 inVOlV6S analySiS rather tha " P rediction > ther e * no need 
to set up the regression equation or calculate the error of estimate 

Appropriate interpretations would depend on the ft s and 
SAMPLING ERRORS 

The classical formula for the standard error of a multiple correlation 
involving n variables is multiple correlation 

S — __ 1-23 






(11.13) 


thi 1 /f ^ Ve , fy ’ Say ° Ver 5 ° 0> and if the Value of 'W ■ ■*is not too high 

man°aT 1 W a Satisfactor y approximation. But when N is 

1 n the number of variables, n, is large relative to the size of the 
sample, the above formula yields an underestimate of the error The 
significance of the multiple correlation coefficient can best be ascertained 
by the analysis of variance technique, to be discussed in a later chapter. 

rn P ffi° Se y / ela 7 ted t0 Samphng 1S the shrinka g e °f the multiple correlation 
th ^ U’ may beSt understand this by taking an extreme case. For 

that°Tn y n T late correlation ’ il is eviden t on a moment’s reflection 
nat, it N —2 the correlation between the two variables must be perfect 

positive or perfect negative (it would be indeterminate if for either variable 
the wo scores were the same); the regression line will pass through both 
plotted points on the scatter diagram. That is, insofar as prediction is 
oncerned, there would be no error. In the case of three variables and 
A - 3, it would be possible to pass a plane through all three plotted points. 

general, if n - N, we would get a perfect multiple r. Obviously A must 
oe greater than n before any meaning can be attached to a multiple r As 
n approaches A, the value of multiple r always approaches unity. 

This suggests that, when n is large relative to N, the real significance of 
an obtained multiple r is questionable. In other words, the multiple 
correlation coefficient is subject to a positive bias, the magnitude of which 
depends on the degree to which n approaches IV. An unbiased estimate r' 
of the universe value of r 1>23 ... „ can be obtained from 


1-23 


= J 1 - a - r 


123 


(—i 

\N - n! 


(11.14) 


This is sometimes known as a correction for shrinkage, since it has been 
observed that in general the correlation between observed and predicted 
values for a new sample tends to be less than the multiple r obtained by 
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means of the fls computed from the original sample. Obviously, if N is very 
large,^ay 500, and Jsma.l, say 10, the amount of btas or expected shnnk- 

age is so small as to be negligible. 

CAUTIONS AND REMARKS 

As already indicated, there are two principal uses for the multiple 
correlation technique: (1) it yields the optimum weightingof 
a series of variables in predicting a criterion and provides an indication 

pi” intTXichTh^ imwary user of the multiple regression and correlation 
ES ma, fall. Fo, .sample, it is possible to .rite a multiple regression 
“iSnT, p,feting soil, achie.eme,,. (AT,) from a 
tUl and mental age (T 3 ). In standard score form it might be ^ - ; .27h 
+ from whlh U might be inferred that sehoo, tE,,’ 

on age to a certain extent but on mental age to a greater extent. However 
it is entirely possible to argue that mental age depends partly on schoo 
“ Eernem 'one eould also use the same data to .he r^o nto 

age on mental age and school achievement; thus * 2 = -5fei + -° 6z 3’ 
which the unwafy might conclude that age depends on school achievemen 

“Multiple correlation may be particularly deceptive when we have 
available several variables, each of which yields a rather low coirelatioi 
with the criterion and from which those yielding the' 
with the criterion are selected for the prediction equation. Such selecting 
tends to capitalize on correlations which might be high because of samp mg 
KSTW example, the author was once requested to compute the 
multiple r for an 11-variable problem. None of the. 
a verv high correlation with the criterion, the highest being .21. The result 
ing multiple was .44, which was statistically significant for the sample of 
89cases When it was learned that 10 variables out of 40 had been selected 
tEmost promising, i.e„ bec.u.e .be, .bowed .he b,gb«,l eoneUio, 
with the criterion, the real significance of the mu Uple r °f was 
questioned. That it really was misleading was clearly evidenced by 
fact that for a second and similar sample the variable originally yielding 
the highest r (.27) now produced an r of -.11. That is, the supposedly 
best single predictor was actually of very doubtful value, and this, coupled 
with ^"tendency for the next highest rs to drop appreciably, mean hat 
predictions by the regression equation could not be as goo as was in e 

f * t ]Sk>thhig 1 has-been said as yet concerning the principal assumption and 
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consequent limitation in the use of multiple regression equations, namely, 
that regressions for the first-order correlations must be linear. There are 
methods for handling multiple correlation for curvilinear regressions. The 
reader is referred to M. Ezekiel’s Methods of correlation analysis .§ 

It is not obvious from our discussion that, in general, the increase in the 
multiple correlation which results from adding variables beyond the first 
five or six is very small. This phenomenon of diminishing returns would 
not, of course, operate if we were to find an additional variable which 
correlated much more highly with the criterion than any of those already 
utilized. J 

Another fact which may not be apparent to the reader is that we can 
expect the multiple r to be higher when the intercorrelations among the 
predictors are low instead of high. This point can be easily demonstrated 
by computing the multiples for, say,r 12 = .50,r 13 = .50, and varying values 
for r 23 . 

An interesting paradox of multiple correlation and an exception to the 
fact mentioned in the previous paragraph is that it is possible to increase 
prediction by utilizing a variable which shows no, or low, correlation with 
the criterion, provided it correlates well with a variable which does correlate 
with the criterion. Thus, if r 12 = .400, r 13 = .000, and r 23 = .707, the 
regression equation will be z\ = . 8002 2 - .566^, and r ll23 will equal .566. 
It is thus seen that, when z 3 is combined with z 2 , an appreciable gain in 
prediction occurs even though when taken alone z 3 is worthless as a 
predictor of z v 

Such a variable has been termed a “suppressant.” We do not quickly 
see just how a suppressant variable, showing no correlation with the 
criterion, can increase the accuracy of prediction. Perhaps this point can 
be explained by reasoning by way of the notion that correlation can be 
thought of in terms of common elements. Suppose that X x is composed of 
10 elements, X 2 of 10, X 3 of 5, and suppose that X x and X 2 have 4 elements 
in common, V 2 and V 3 have 5 elements in common, and A* and V 3 have no 
overlapping elements. Diagrammatically, the variables and elements 
would be 


aaaaaabbbbcddddd 


By substituting in the common element formula (9.15) for correlation, 
we find r 12 = .400, r 13 = .000, r 23 == .707. These lead to z\ = .800z 2 
— .566z 3 , and r t . 23 == .566. Variable X 3 has a negative regression weight, 
i.e., by the use of X 3 something is being subtracted or suppressed. As set 
up here for illustrative purposes, all the elements of X 3 are contained in X 2 ; 

§ Ezekiel, M., Methods of correlation analysis , New York: John Wiley and Sons, 1959. 
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these elements are not related to X 1 and hence their presence in X 2 must 
tend to lower the correlation between X ± and X 2 ; if these elements could be 
suppressed, the correlation between X x and X 2 minus the irrelevant (so far 
as X Y is concerned) elements of X 2 should be higher than r 12 . Actually, if 
we think of the ’ elements of the diagram as being nonexistent, we would 
have variation in X 2 dependent on only 5 elements, 4 of which overlap with 
X v The correlation between X 4 and the abridged X 2 would be 4/V 7 10(5) 
or ,566, which has exactly the same value as the multiple r obtained pre¬ 
viously. This exact correspondence to r 123 will be obtained only when 
all the X 3 elements are contained in X 2 . If X 3 contains other elements, its 
use as a suppressant will aid in predicting X t , but the resulting r V2 ^ will not 
correspond to an r deducible from the common element formula. The 
reason for this is left as an exercise. 

The student, by resort to the notion of common elements, may secure 
a better understanding of the proposition that a higher multiple is obtain¬ 
able when the correlations with the criterion are high and the correlations 
between the predictors low or zero. The reader should be warned, however, 
that such a condition is hard to realize in practice, as is also the finding of 
variables which will qualify as suppressants. 


NOTE ON NOTATION 

The symbol r x . 23 has been used to represent the correlation (multiple) 
between X t and the best combination of X 2 and X 3 . This should not be 
confused with r 12 . 3 , which indicates the correlation (partial) between X t and 
X 2 with the effect of X 3 ruled out or held constant. The symbol S y . x , it will 
be recalled, stood for the standard error of estimate of Y as estimated from 
X; S x . 2 would be the error of X x when estimated from X 2 ; and S V23 would 
be the standard error of estimate of X x when estimated from X 2 and X 3 by 
means of the multiple regression equation. 

In the foregoing discussion, f$ 2 has been used as the symbol for the 
regression weight of X 2 . A more formal, albeit cumbersome, notation 
would be /3 12 . 345 , which would be read as the regression of X x on X 2 , i.e., 
the coefficient for X 2 , when used in combination with X 3 , X 4 , and X 5 . It 
is not an accident that the subscript pattern resembles that for the partial 
correlation coefficient. If we were dealing with a three-variable problem, 
f$ 2 could be written as j3 12 . 3 . This notation really means that we have the 
net regression of X x on X 2 when X 3 is held constant. Hence the coefficients 
are sometimes spoken of as partial regression coefficients. As a matter 
of fact, these partial or multiple regression coefficients can be computed by 
way of partial correlation coefficients, but the method is not nearly so 
straightforward and self-checking as the Doolittle procedure. 



Chapter 12 

OTHER CORRELATION 

METHODS 


The product moment correlation measure is applicable only when the 
two variables are graduated, is restricted by the assumption of linearity of 
regression, and needs careful qualifying if either or both variables yield 
skewed distributions. There are, therefore, many problems for which it is 
inappropriate. In general, the majority of the situations which are met in 
practice can be handled by some type of correlational technique. 

There are no general rules to follow in the case of variables yielding 
skewed distributions. Frequently, we can use a logarithmic transformation 
of such a variable and thereby secure scores which are at least approxi¬ 
mately normal; or we may deliberately normalize the distribution by 
converting the raw scores into T scores. When we consider the arbitrary 
units involved in most psychological measurement, such a procedure 
would seem not only permissible but also defensible in that the correla¬ 
tional description of the relationship need not be qualified because of 
skewness. 

The situations arising most frequently in practice, for which measures 
of correlation are apt to be needed, can be subsumed under the following 
six headings: (1) graduated measures for one variable, dichotomized or 
two-category information for the second variable; (2) both variables 
dichotomized; (3) three or more categories for one variable and two or 
more for the second; (4) three or more categories for one variable and a 
graduated series of measures for the other; (5) both variables graduated, 
with curvilinear relationship; (6) when data are rank-orders. 

An estimate of the degree of correlation for each of the foregoing 
situations can be obtained provided certain assumptions concerning the 
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variables can be regarded as tenable. Ordinarily the graduated variable 
can be thought of eifher as being continuous or as progressing in a sufficient 
number of discrete steps so as to give the appearance of continuity. The 
armroach to normality for such series can, obviously, be specified. The 
nature of the categorLed variable, whether discrete or continuous can 
ordinarily be ascertained on logical grounds but the 

continuous variable for which we have only a distribution by categories 
would yield a normal distribution if we had some measuring stick for the 
trait is not easy to answer. 

BISERIAL CORRELATION 

When one variable is measured in a graduated fashion and the other is in 
the form of a dichotomy, we have the so-called biserial situation, for which 
there are two measures of correlation: biserial r and point bisenal . The 
difference between these two measures depends essentially on the type o 
alSmpdon which is made concerning the nature of the dichotomized 

"Ihfmost typical example of situations calling for one or the other of 
these measures is to be found in the test (mental and personality) fie . 
£e cion between an item scored as pass or fail (yes or no, like or 
dllike etc ) and a graduated criterion variable (or a total score on all of a 
set of items) We need to know each individual’s score on the graduate 
variable S the dichotomy to which he belongs. Then we can make a 
distribution or scattergram with from 12 to 20 intervals for the graduated 
variable alo„ S .be „ .at, .nd with two interval, for the 
the x axis Such a correlation scattergram is given in Table 12 1, wh c 
involves pass-fail on “abstract words” vs. composite IQ on Forms L and M 
of the 1937 Stanford-Binet. It is obvious that there is a tendency for those 
who M the item to have lower IQs than those who pass-performance on 

H it - be assumed J^und^the 
dichotomy there is a continuous variable, we can obtain a measure ot 
correlation which is an estimate of what the product moment correlation 
would be in case the dichotomous variable were measured in such a way 
as to produce a normal distribution. This estimate is given by 

_ (Mb - M 1 )(p 1 p 2 ) (12.1) 

zS„ 


Tu = 


( 12 . 2 ) 


or by the exact equivalent 


(Mb - M„)p 2 
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Biserial table for “abstract words” as X and Binet IQ as Y 
Item 


IQ 


145-149 
140-145 
135-139 
130-134 
125-129 
120-124 
115-119 
110-114 
105-109 1 

100-104 1 

95-99 4 

90-94 7 

85-89 9 

80-84 3 

75-79 4 

70-74 5 

65-69 

60-64 3 


Fail Pass 

0 ) ( 2 ) 


1 

1 

3 

4 
6 

10 

7 

8 

5 
9 

6 
2 
1 


63 

109.86 


Totals 37 

Means 84.43 


Totals 


1 

1 

3 

4 
6 

10 

7 

9 

6 

13 

13 

11 

4 

4 

5 

3 

100 

100.45 


S y = 17.69 , 

Pi = .37 

Pz = -63 

z = .378 

_ (109.86 - 84.43)(.37)(.63) 

" (.378X17.69) 

= .89 

Or by formula (12.2): 

_ (109.86 - 100.45)(.63) 
h C378)(17.69) 

= .89 

_ (109.86 - 100.45) /]63 
^ 17~69 V 37 

= .69 


in which 


p ± = proportion of cases in the first category. 
p 2 = proportion of cases in the second category. 
M 1 = mean of 7s for cases in the first category. 
M 2 = mean of 7s for cases in the second category 
M y = mean of all the 7 scores. 

— S of all the 7 scores. 


^ ordinate for the unit normal curve at the point where p. (or »„) 
cases are cut off; it is determined by entering Pl or p„ whichever is 
smaller, as a q value in Table A, then reading off the adjacent 

necessary) 3 Ue ^ f ° Urth C ° lumn of the table (interpolating if 

Formula (12.2) is the more convenient when each of a series of items is to 

C ^ gradUat£d Varfable - The -e 

''X “ assumed not onl y tha t a normal distribution 
nderlies the dichotomy but also that the regressions would be linear if the 
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checked, 11 P assumption has troubled many. Actually, 

certaMy Jr^sumpdve evidence for continuity, and a similar argumeffi can 

~;s=ssss^= 

worrv about the mathematical assumption of normality when u 1 g s- 

imtifv the use of r, with obviously continuous variables by sayi g, 

y/PiPi , 


Sr b 


(12.3) 


JN 


moment r as given by the analogous classical form, S r ( /' 

“ " £-« h * “ srgss 

incidentally will not be counterbalanced entirely by the n atively greater 
formula J 2)VoSd o^ercom^Sifficuky^since it is always po^ible to 

*» wseriai «■.*■ iar ® e iS" rs 

^TrT^or^l* is available for use with biserial r, the 
difficulty of skewed sampling distributions for high r,s cannot be overcome. 
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In testing the null hypothesis (that no correlation existsl tin* r t 
formula (12.3) may be dropped! For N 

mT^M ‘ 18 P ° SSib,e ^ ^ ° f the ' t6St f0r the diff erence between 

i JT 7 ”, " ,d r' d “ a ' '»' P*“°M r « simply lhe 

estimate would not equal S y Vl — r 2 b 

eJfjJ haV ^ ) ? C ° re t0 USe in P redictin S an individual’s X category we 

7 ne V t 6 8 ' Ven indiVidUal ’ S X P° sition * m the first category t 
75 per cent of the time the prediction would be correct. But sfcl/a ner ’ 

c ntage statement might itself be subject to grave sampling e" nc e ft s‘ 

ba d on a small N; and such a statement of error mightTefTo be 
qualified according to the ps. Why? S t0 be 

Point biserial, r pb . If the dichotomous trait is truly discrete an annm- 
pnate measure of correlation is given by * ^ 

= (M* ~ M,\/m 


or its equivalent 


7)b 


1 mh - 


- M, 


£2 

Pi 


(12.4) 


(12.5) 


tS n“S“ ,hes "! i"2 

examining the following connection between the two coefficients: 7 




X /PlPi 


(h) 


" = f ' 398 f and ^ = - 798 ^’ and aS the dich otomy 

departs farther and farther from 50-50 the discrepancy between r sb and r 
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increases. For a 10-90 cut we have r vb = .585/+ The maximum degree of 
correlation between a dichotomous variable and a. normally distributed 
variable will occur when there is no overlap between the Y distributions 
for the two categories. For such a situation r h will be either +1.00 or 
—1.00 regardless of the cut, whereas r pb will be +.798 for a 50-50 cut and 
only ±.585 for a 10-90 cut. These two coefficients are not on the same 
scale; they will agree only when there is exactly no relationship between 
the two variables. Even if the dichotomous variable were a genuine point 
variable, r pb as an expression of the degree of relationship would not be 
comparable either to r b or to the product moment r between two variables 
measured in a graduated fashion. 

Despite the fact that true point variables are practically nonexistent in 
psychology and despite the difficulties of interpreting r pb as a terminal 
descriptive statistic, r pb has a rightful place in certain analytical and practi¬ 
cal work where the two categories are arbitrarily, for convenience, assigned 
point scoring values of, say, 0 and 1. For example, if a dichotomized vari¬ 
able with point scoring were included in an n variable multiple regression 
equation, point biserial rs would be the correct values for the correlation 
of the dichotomized variable with the remaining n — 1 variables. 

For the large sample situation the significance of r pb (as a deviation from 
zero) may be tested by using a = \[Vn as its standard error. For small 
samples, the / test for the difference, M 2 — M l9 is appropriate. 

A troublesome difficulty with the biserial coefficient, r b , is that it 
occasionally exceeds unity. The usually given explanation for this is that 
the assumption of normality for the dichotomous variable is not tenable, 
but it seems more likely that when such rs occur it is because the graduated 
variable, for the combined categories, is either platykurtic or bimodal in 
distribution. 

TETRACHORIC CORRELATION 

When both variables yield only dichotomized information, as, for 
example, two items scored as passed or failed, it is possible to secure an 
estimate of what the correlation would be if the underlying traits were 
continuous and normally distributed or if they were so measured as to give 
normal distributions. The measure of relationship for such a situation 
is known as the tetrachoric correlation coefficient , usually designated as r t . 
It is not feasible to derive here the formula for tetrachoric correlation, but 
perhaps a few words will help us understand the reasoning back of the 
formula. 

Let us suppose that we have before us a scattergram for the correlation 
between height and weight; let us further assume that this scatter exhibits 
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all the characteristics of a normal correlational surface as defined by 
equation (9.16). That is, the two marginal distributions and all the vertical 
and horizontal array distributions are normal; the regressions are linear; 
and the arrays homoscedastic. For such a normal plot, it is possible, 
knowing the degree of correlation and the Ms and Ss of the two variables, 
to specify how many or what proportion of the cases will fall in any given 
segment of the scatter plot. This can be done by mathematical manipula¬ 
tion of formula (9.16) or by the aid of Table VIII of Pearson’s Tables for 
statisticians and biometricians , part II. * 

Now, of course, if the student had placed before him a scatter for height 
vs. weight and were asked how many cases fell in that portion of the table 


Table 12.2. Correlation for height and weight dichotomized 

Below Above 
1201b. 1201b. 


Above 68 in. 
Below 68 in. 


10 

80 

60 

50 


70 130 200 


below 120 pounds and also below 68 inches, he would simply count them. 
But suppose he were told that, when the two axes were cut at 120 pounds 
and 68 inches, the frequencies in each of the four quadrants so formed were 
as shown in Table 12.2. The purpose of tetrachoric correlation is to 
ascertain the degree of correlation which would permit the observed 
frequencies in such a fourfold table. A more rigorous statement would be: 
Given the four frequencies, what should be the true correlation—for the 
scatter underlying the fourfold table—in order to make the obtained four 
frequencies most likely ? 

In order to secure this estimate it is necessary to convert into a propor¬ 
tion each of the four frequencies and each of the marginal totals by dividing 
by N. For the fourfold table we may symbolize the frequencies as in 
Table 12.3, the proportions as in Table 12.4. Then, the tetrachoric 
coefficient can be obtained from the following rather forbidding equation: 

-- — = r + xy C + (x 2 - 1 )(y 2 - 1) - 

V, 2 6 

+ O 3 - 3*)(«/ 3 ^ 3 24 + " ' ( 12 - 6 ) 
in which it is assumed that both q and q' are less than .50. The general rule 

* Pearson, Karl, Tables for statisticians and biometricians , part II, Cambridge: 
Cambridge University Press, 1931. 
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n f formula Thus one can have c - qq (as given;, o if , 

- .2 .3S "P »““ “ 

Table 12.3 Frequencies 

+ 


Table 12.4. Proportions 

+ 


A 

B 

C 

D 


A + B 
C + D 


A + C B+D N 


P 

q 

1.0 


t Chesire L Saffir, M.. and Thurstone, L. L„ Computing diagrams for the tetrachonc 
c0 i2Z±fficientM0-. University of Chicago Boohstore, 1933. 
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information, but it can also be used instead of biserial r or the product 
moment r, since situations for which these two methods apply can readily 
be converted into fourfold tables by simply dichotomizing the graduated 
variables. The advantage of so estimating correlation is that tetrachoric r 
is much easier to determine (by using the computing diagrams) than is 
calculating either bisenal r or the product moment r. Indeed, this fact of 
computational economy has led a number of investigators to use r, when 
product moment rs could be determined. That such a practice may be 

flucmatfonof e / COn0nly beC ° meS quite evident when we turn t0 the sampling 
The standard error of r t is closely approximated by 

'Jpqp'q' 


s,,= 




H-a 

1 

3 ? 

fi /sin-M 2 l 

1 

\ 90° /J 


(12.7) 


When this is compared to the classical formula for the standard error of a 
product moment r, i.e., to S, = (1 - r^VN, it will be seen that the 
tetrachoric - r has a much larger sampling error. To illustrate the difference 

. e en " ors for ^ our ,s for two different dichotomies are presented in Table 
12.5 along with the errors (by the classical formula) of the corresponding 
product moment rs for N — 100. r 6 

Table 12.5. Sampling errors of r, and r compared 


r or r t 

P 

P' 


S r 

.00 

.50 

.50 

.157 

.100 

.00 

.80 

.80 

.204 

.100 

.40 

.50 

.50 

.130 

.084 

.40 

.80 

.80 

.182 

.084 

.60 

.50 

.50 

.115 

.064 

.60 

.80 

.80 

.150 

.064 

.80 

.50 

.50 

.073 

.036 

.80 

.80 

.80 

.095 

.036 


It can readily be seen from this table that r t is much less stable than r; in 
tact even for the most favorable comparison (.50-.50 cuts, low rs) the 
standard error of the tetrachoric coefficient is more than 50 per cent greater 
t an that for the product moment coefficient. This means that we must 
have more than twice as many cases to attain the same degree of sampling 

'5 f ~ a tetrachonc as for a product moment correlation coefficient 
bor ,«0-.20 cuts and low correlations, four times as many cases are needed 
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to have comparable sampling errors. For high correlations and also for 
more extreme cuts, r t compares still less favorably with r. 

The foregoing discussion and further study of formula (12.7) lead to two 
obvious conclusions. 

First, the increasing sampling instability of r t as the dichotomies become 
more extreme warns us that, unless N is large, we cannot place much 
reliance on r t for cuts more extreme than .10-.90; seldom will N be large 
enough to warrant confidence in a tetrachoric based on cuts more extreme 
than .05-.95. 

Second, in using r t instead of the product moment r when the latter is 
calculable, we are always throwing away the equivalent of more than half 
the available data. Thus the computational economy may be an expensive 
luxury—it is very doubtful whether the calculation of a product moment r 
for N cases will ever require anything but a fraction of the expense of 
securing data on the additional N cases needed to counterbalance the 
greater sampling error incurred in using the tetrachoric coefficient. 

As in the case of r b , no r to z transformation exists for handling the 
sampling errors of high tetrachorics. For testing the null hypothesis, ttiat^ 
for the universe is zero, we m ay use a simpler expression for its standard 
error, namely, S rf = V pqp'q'lz x z y V~N. Another method for judging the 
significance of the correlation computed from a fourfold table will be 
presented in the next chapter. 

The use of tetrachoric r is circumscribed by an assumption that the 
underlying correlational surface is of the normal type. Among other things 
this implies (1) that the dichotomized traits are continuous and normally 
distributed, and (2) that the regressions are linear. Although, as discussed 
in connection with biserial r, we are usually ignorant of the tenability of 1, 
this ignorance can be partially overcome if the correlation is regarded as 
that which would obtain if the traits were normalized; i.e., it can be 
argued that the use of tetrachoric r automatically normalizes the distribu¬ 
tions. It is not so easy to dispose of assumption 2, since the normalizing of 
variables will not necessarily lead to linearity of regression. The only 
consolation here is that measured psychological traits are usually linearly 
related, if related at all. 

FOURFOLD POINT CORRELATION 

If we can safely assume point distributions for both dichotomous vari¬ 
ables, a descriptive measure of correlation can be obtained from a fourfold 
table (Table 12.3) by 


BC - AD 
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or from the table of proportionate frequencies (Table 12.4) by the exact 
equivalent 

(12.9) 

V pqp'q' 

The fourfold point correlation coefficient is frequently referred to as the 
phi coefficient and designated by <j>. Actually, it is the product moment 
correlation between the two variables each scored in a point fashion (say, 
0 and 1). Unlike the point biserial, r p can be unity but only when p = p'. 
Otherwise (i.e., in nearly all situations) r p and r t from the same table will 
differ in value, with r v being lower, and the difference between the two 
becomes greater as the dichotomy for either variable, or both, varies 
farther and farther from 50-50. 

A few examples will illustrate the difference in the magnitude of r v and 
r t . It is possible to have a fourfold table with 50-50 and 50-50 cuts which 
yields an r t of .50 and an r v of .32, and a table with 16-84 and 16-84 cuts 
which yields an r t of .50 and an r v of only .26. For similar tables (as regards 
cuts) we may have r p values of .59 and .52 when r t is .80. Thus, r p is not 
interpretable on the same scale as r t (or r or r b ) as a measure (terminal 
descriptive statistic) of the degree of relationship. 

However, r p is useful (and necessary) in certain analytical work. If 
variable U and variable V were dichotomous and each scored as 0 and 1, 
then r p would be the appropriate value to use in formula (9.9) to obtain the 
variance of IF, defined as U + F. If formula (5.5) for the standard error of 
the difference between correlated proportions were written analogously to 
formula (6.8), r p would be used. It is also used in the statistical theory of 
mental tests. 

For testing whether r p deviates significantly from zero we may safely use 
1 iVN as its standard error when N is not small. 


CONTINGENCY COEFFICIENT 

The contingency coefficient is a measure of the degree of association or 
correlation which exists between variables for which we have only categor¬ 
ical information. The number of categories can be such as to provide a 

2 by 2 table (as for tetrachoric correlation) or a 2 by 3, or a 3 by 3, or a 

3 by 4, or a 4 by 4, or a k by / table. This coefficient is stated in terms of a 

quantity known as (chi square) thus 


V N + x 2 


( 12 . 10 ) 



where 


( 12 . 11 ) 
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in which O is the observed frequency (not percentage) and E is the 
exnected frequency for a given cell. In a 2 by 3 table there would be, six 

cells, hence six values summed to get f- The expected cel1 There 

the contingency situation are those frequencies which would exist if there 
were no association or relationship between the given variables It ca 
thus be anticipated that, the larger the discrepancy between expected and 
observed frequencies relative to the expected, the larger the value of % and 

consequently the higher the value of C. 

An example will help to clarify the preceding. Suppose_Uiat we 

each of which yields three categories or classifications, and 


Table 12.6. Contingency table 

College 
High school 
Grade school 


Low Medium High 


5 

(20) 

45 

(60) 

50 

(20) 

50 

(40) 

no 

(120) 

40 

(40) 

45 

(40) 

145 

(120) 

10 

(40) 


100 300 100 


100 

200 

200 

500 


that the observed frequencies are as given in Table 12.6, which also con- 
tains the expected frequencies in parentheses. (Fictitious data margjna 
frequencies arranged so as to simplify exposition.) ^ ascpr 

the expected frequencies needed in the computation of % , we ask what 
S Lm b. .xpecied if .!«,«»=,»». ™ 

association between the two variables. Consider the 100 classified as 
college; if no association existed, we would expect that these 10C> wou e 
Lributed according to a 1, 3, 1 ratio, i.c., m the same ratio as the 
marginal frequencies at the bottom. Thus the expected cell frequencies for 
the fop row of cells would be 20, 60, 20. The expected frequencies for the 
middle and bottom rows of cells should also be in a 1,3,1 ratio. Botl) these 
rows would have expected frequencies of 40, 120, 40. 

It will be noted that (1) the expected frequencies for > the 
as they should, the ratio of 1, 2, 2, i.e., the ratio of 100, 200, 200 for he 
marginal frequencies on the right; (2) the expected fre ^ cleS S " c ^ 
same marginal totals as the observed frequencies; and (3) the expected 
frequencies actually exhibit a zero relationship between the two character- 

istics. 
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In Practice, the computation of the expected frequencies can readily be 

atonrtthmt y ° f tW ° SChemeS: (1) express the mar § inaI totals 

along the bottom as proportions of the total N, then multiply each of the 
frequencies on the right margin by each proportion in turn, entering the 
resulting product in the cell common to the two marginal figures involved 
the multiplication; or (2) multiply any frequency on the bottom margin 
by any frequency on the right margin, and then divide this product by N- 
the result is the expected frequency for the cell common to the two mar- 
gmals involved in the products. 

The computed 0 " °f f is now a routine matter. We simply take each 
cell in turn square the difference between the observed and expected value 
and divide by the expected frequency. Thus we have 


(5 - 20) 2 /20 = 11.25 
(45 - 60) 2 /60 = 3.75 
(50 - 20) 2 /20 = 45.00 
(50 - 40) 2 /40 = 2.50 
(110 - 120)2/120 = .83 
(40 - 40)2/40 = .00 

(45 - 40) 2 /40 = .62 

(145 _ 120)2/120 = 5.21 
(10 - 40)2/40 = 22.50 


The sum of 
contingency, 


these quantities, 91.66, is To get C, the coefficient of 
the value of is substituted in formula (12.10), thus 


C = 


91.66 


500 + 91.66 


= .39 


This strength of association is not to be interpreted as indicating the 
same degree of relationship as an ordinary (or biserial or tetrachoric) 
coefficient of the same magnitude. One reason for this is that the upper 
limit for the contingency coefficient is a function of the number of cate¬ 
gories. The upper limfifor a 2 by 2 table is Vi; for a 3 by 3 table, V&; 
for a 4 by 4 table, Vf; for a 5 by 5 table, Vf; for a k by k table,’ 

The ® XaCt Upper Iimits for octangular tables, such as 2 by 3 
2 by 4, 3 by 4, are unknown. (As an exercise, the student might demonstrate 
to his own satisfaction the upper limit for 2 by 2 and 3 by 3 tables.) The 
reader will also note that C can never be negative. 

Despite havmg varying maximal values, contingency coefficients have a 
decided advantage over other measures of relationship; no assumptions 
involving the nature of the variables need be met—continuous or discrete 
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variables, normal or skewed or any shaped distributions for underlying 
traits, ordered or unordered series, and combinations thereof are permissible. 

Disadvantages are that any two contingency coefficients are not compar¬ 
able unless derived from tables of the same size, that they are noncom¬ 
parable to product moment rs (and estimates thereof) unless certain 
corrections are applied, and that the formula for sampling erroris unwieldy. 
The necessary corrections and the sampling error formula may be found in 
Kelley ,i but before consulting Kelley, the reader might bear in mind,.the 

following comments. , 

In regard to the corrections, the first is for number of categories. The 
additional correction to make C an estimate of r involves the assumption 
that the underlying traits are continuous and normal m distribution. 
Furthermore, this correction is very tedious to make. It is suggested that, 
if the assumption of normally distributed continuous variables is tenable, 
we are justified in reducing a contingency table of more than four cells to a 
2 by 2 table and then determining the value of tetrachoric r. When redqpmg 
to a fourfold table, we should combine adjacent categories so as to have 
dichotomies as near to .50-.50 proportions as possible. The combination 
should not be made on the basis of the pattern of cell frequencies, since this 
is likely to involve a capitalization or decapitalization on chance. We 
might take several or all possible fourfold combinations, thus securing 
several tetrachoric rs which may then be averaged. 

As to the unwieldy sampling error formula for C, it is suggested that 
insofar as we wish simply to test the null hypothesis, i.e., that there is no 
relationship between the two given variables, we need only enter the 
value of x 2 into an appropriate probability table to test its significance. 
If f is significant, the relationship is significantly greater than zero. This 

use of y 2 will be discussed in Chapter 13. 

Chi square for a fourfold table can be readily obtained by formula 
without first computing expected frequencies. Thus for a set of frequencies 

like that of Table 12.3 we have 

_ N(BC - A Df 

X ~ (A + B)(C + D)(A + C)(B + ~D) 

This resembles formula (12.8). In fact, there is a relationship between the 
fourfold point coefficient (r p ), % 2 , and C: 

Y 2 

r 2 = — and C = 


N 


1 + r p 

Other measures of association or of correlation between altributes have 
been advocated, This is not the place to argue the pros and cons of these 

t Kelley, T. L„ Statistical method , pp. 266-271, New York: Macmillan, 1924. 
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other measures. It seems to the author that the measures we have discussed 
are the more defensible. 


THE CORRELATION RATIO OR rj (ETA) 

It will be recalled that one way of understanding the product moment 
correlation coefficient is to note from the relationship r 2 = 1 — S 2 jS 2 
(or r 2 = 1 — S%.JS 2 X ) that the degree of correlation is a function of the 
error of estimate variance relative to the total variance of the variable being 
predicted by a linear regression line. If the array means fail to fall on a 
straight line, it can rightly be argued that better prediction can be made by 
using a curve which really “fits” the means or by using the means them¬ 
selves. The latter procedure would entail an error of estimate which would 
be a function of the variance within the arrays about the array means 
An over-all variance about the means of the vertical arrays can be calcul¬ 
ated by squaring the deviations about the mean of each array, summing 
these for all arrays, and then dividing by A. The resulting variance for 
the vertical arrays may be labeled S\ y , for the horizontal arrays, 5 ? 

he correlation ratio, r), in terms of the accuracy with which Ts can be 
predicted from As is defined as 

*? 2 « = 1 - 

and for As predicted from 7 s, we have 

v\„ = 1 - 

Are two ??s necessary ? We have not proved herein that the variance about 
e mean is smaller than about any other point, but this fact is readily 
deducible from the computational formula for 5 in terms of deviations 
from an arbitrary origin. If AO coincides with the mean, 2 d 2 will equal 

r'*. ; ,°, does not c0lncide with the mean, a subtractive term will always 

e involved. It follows that S av will be less than S y . x and that S ax will be less 
than hence both i?s will exceed r, but to varying degrees, depending 
on the extent to which the array means fail to fall on a straight fine. Since 
it is possible, and likely, that the means for the vertical arrays will not 
exhibit the same departure from linearity as those for the horizontal arrays 
it is not reasonable to expect the two 17 s to agree. 

The )?s indicate the relative accuracy with which we can predict on the 
basis of array means, and accordingly they are useful measures of the 
extent of correlation when the regressions are curvilinear. The correlation 
ratio can also be utilized when the regression is linear; hence it is more 
generally applicable than the product moment coefficient which is useful 


s\ y 

s\ 

(12.12) 

s*. 

(12.13) 
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only in the special case where the assumption of linearity is tenable. The 
correlation ratio, however, does not enter into the regression equation 


Even if the regressions were exactly linear for some defined population, 
a given sample would show deviations from linearity, and therefore t?s for 
successive samples would show chance sampling deviations from r. By 
how much must r t exceed r before we suspect curvilineanty? The only 
adequate statistical test for answering this question involves the analysis ot 

variance technique and hence is postponed to Chapter 15. 

Another definition of v can be had by starting with the proposition that 
the variance S\ can be broken down into components, a predictable and 
an unpredictable part, or = S\ + S\ v , in which S\ is the variance 
of the array means weighted for the number of cases in the several arrays. 
Then we have 17 defined as > f„ = S 2 m JS% and also as r ; — S- m J x . 
These are analogous to r* = S\IS\ and r 2 = S*JS\, and accordingly we 
may interpret 17 V as the proportion of Y variance explained by or associ- 

ated with variation in X. t , 

Since the rjs are most readily computed by methods to be developed 

later (pp. 278-80), no illustration will be given here. 


RANK CORRELATION 

Rank-ordering by judges is frequently resorted to when no measuring 
instrument is available for a trait. One measure of relationship between 
variables for which we have individuals ranked is given by p (rho), the 
Spearman rank-difference correlation coefficient: 


6 £P 2 

N(N 2 - 1) 


(12.14) 


in which D is the difference between an individual’s two ranks (for the two 
traits). When we have ranks for one variable and scores for the other we 
can use the scores as a basis for setting up ranks for the latter, and then 

compute rho. . . , . , c , 

Whenever rankings on a given variable involve ties (the judges fail to 

distinguish between two or more individuals or the scores used for ranking 
are such that two or more persons have the same score), the ranks are split 
between individuals who are in tie positions. Suppose three ranks have 
been assigned and that two individuals are tied for the fourth position. It 
they were distinguishable, they would use up ranks 4 and 5, so we assign 
each a value of 4.5. Had three persons tied for this position, we would split 
ranks 4, 5, and 6 , giving each a rank of 5. Then when we proceed to the re¬ 
maining individuals we must remember that rank position 6 has beep used. 
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The computation of rho is illustrated in Table 12.7. 
the algebraic sum of the Ds must be zero can be utilized 
checking the D column values. 


The fact that 
as a means of 


Table 12.7. Computation of rank-difference 
correlation coefficient 

Ranks Differences 

'- A -\ r-*-^ 

Persons 1st 2nd D D 2 


A 

B 

C 

D 

E 

F 

G 

H 

1 

J 

K 

L 

M 


3 

1 

2 

4 

4 

2 

2 

4 

10 

10 

0 

0 

8 

4.5 

3.5 

12.25 

5 

6 

-1 

1 

9 

11 

-2 

4 

1 

3 

-2 

4 

2 

7 

-5 

25 

13 

13 

0 

0 

11 

4.5 

6.5 

42.25 

7 

8.5 

-1.5 

2.25 

6 

8.5 

-2.5 

6.25 

12 

12 

0 

0 


0 105.00 = LZ) 2 

6(105) 

13(169 - 1) = Jl 


Rho for ranks based on scores for two normally distributed variables 
tends to be slightly (less than .02) lower than the product moment r 
computed from the scores; hence rho is comparable with r as a measure of 
the strength of relationship. 

To test the significance of rho, for TV of 10 or more, we may safely use 


IN 


V 1 _ 

which approximates the t distribution with TV - 2 degrees of freedom. 

Rho does not possess the mathematical advantages inherent in r and 
therefore has merit only when the observations on one or both variables 
are ranks instead of measures. Because of judgmental difficulties in 
assigning ranks for TV large, rank-order data are apt to be confined to 
small samples, but for TV less than 10 the t test of the significance of rho is 
not satisfactory. Kendall§ has proposed another measure, designated T 
§ Kendall, M. G., Rank correlation methods, London: Griffin 1948. 
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(tau), for use with ranks which is superior to rho insofar as testing signifi¬ 
cance is concerned when IV is very small. As a measure of the degree of 
relationship, tau, like rho, has the property of being unity for a perfect 
relationship; for zero and near zero correlation these two measures tend to 
be alike numerically, but for other degrees of association tau tends to be 
lower than rho—at times only two-thirds the magnitude of rho. Thus tau 
is not comparable with rho (and r), and furthermore there seems to be no 
specifiable way of estimating one from the other. For a much more 
adequate discussion of both tau and rho, the reader is referred to Kendall. 

THE DISCRIMINANT FUNCTION 

Suppose we have two or more variables (measured in a graduated 
fashion) which we wish to combine into a total score for the purpose of 
discriminating between two groups. The question arises as to how best 
weight the variables so as to obtain maximum difference between the total 
score means for the two groups. This difference must be considered 
relative to the within-groups variability; otherwise we could easily produce 
a large numerical difference by the simple operation of summing the scores 
and multiplying by a large constant, whereas the real purpose is to have 
score distributions with the least amount of overlap for the two groups. 
We want the difference to be maximal relative to the spread of scores 

within the groups. . , . . „ 

The simplest way to determine the weights for the several variables is to 

compute the 0s, thence the Bs , as in the multiple regression problem. For 
this purpose, the product moment correlations among the two or moie 
independent variables are calculated, and the point biserial r is calculated 
between each independent variable and X l9 the dependent variable (member¬ 
ship in one or the other of the two groups, with one of the groups consist¬ 
ently designated as corresponding to the first category for the biserial 

setup). . . . -■ ,. 

Actually, since the problem here is that of ascertaining optimum relative 

weights rather than fitting a regression plane, we need not calculate 
the A of the regression equation nor worry about S 1 (= V/>i/> 2 of the biserial 
setup). The weights may be taken simply as 0 2 /S 2> (S 3 IS 3 , etc., all multiplied 
by a constant so chosen as to have weights which exceed, say, 10 —thereby 
avoiding decimals. Some of the weights may be negative, according to the 
sign of the corresponding 0. If all or a majority of the weights are negative, 
the signs of all may be reversed. The relationship of the total of the opti¬ 
mally weighted scores to group membership is describable by the multiple 
r computed by equation (11.12). Such a multiple r is the point biserial 
between the total weighted scores and belonging to one or the other of 
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the two groups. Or we may compute the weighted scores for all At cases 
and then make distributions for the two groups separately in order to 
scrutinize the amount of differentiation (or overlap) present. 

CORRELATION OF SUMS (OR AVERAGES) 

There are times when it is useful to have a formula for the correlation 
between two variables, each of which is made up as the sum (or average) of 
two or more variables. As an introduction to the problem, consider the 
situation m which one variable is obtained by summing three parts the 
other by summing two. In deviation units, let 

x = \ H + x c and y = y + y 

Then 

r m = + X b + x e)(y A + ?/,;) 

NS A NSJSy 

- + s X a y n + 'Z x b y A + Y,x h y B + 'Zx r y A + 'Zx.Vr, 

Each term in the numerator when divided by N will yield an r times the 
product of two 5s, and the value of S x and S„ will each be given bv the 
square root of the variance of a sum of correlated scores, hence we have 

r X y = r y tSA + r nsS a S B + r bA S h S A + r bB S b S„ + + r,„S.S„ 

VS 2 a + + S 2 C + 2r ab S a S b + 2 r ac S a S c + 2 r bc S b sT~ 

x VS S , + S 2 b + 2r AB S A S B 

as a formula in terms of the 5s for the component parts, the correlations 
among the parts entering into x and ditto y, and the cross-correlations of 
the x and y components (in the six numerator terms). This means that if 
the given rs and 5s are available it is possible to obtain the correlation 
between the specified sum scores without ever computing a sum score for 
any of the N individuals. 

Suppose x is made up of m variables and y of M variables (m not 
necessarily equal to M): K 

X = X a + *6 + * • * + X t + • • • + 

y = Va + y B + • • • + Vx + * • • + y M 

A little thought will indicate that the expression for the r between the 
two sums will involve a numerator containing mM terms of the type 
fijSiSj with i = and / = A, • • • , M. These are the cross¬ 

correlations. Under one radical sign we will have m variances of the type 
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~ 2 . n i U s either m(m - l )/2 terms of the type 2 r ,withz ^ joxmim - 1 ) 

* ^ rVfw r s S and under the other radical we will have M variances 
terms like ana una r s S with I^J. Instead 

Of 1"£ »«, t can use 2 

» mdfcate fte adding process. Accordingly, a general formula for lire 

correlation of two sums (or averages) can be written as 


S nSfr 


Tl,xVv N /SS 2 y + Sr^SiSjVsS^ + Sr 7J S 7 S. 


0 V;) 


(12,15) 


“ F^Sad” “Sen in a different way if 1 . is-noted any 

therr wean ,. d b (h a pp rop riate mean tunes the number 

STS ( S —7 Using an o,eX,d bar to indicate a mean, we 

have 


mMrgSiSj 


r HxZy — 


(12.16) 


s! m S\ + 1 XySA-^MS/ + M(M - l)r„S,S, 


parallel measure’s of trait 7. By definition of parallel measures, the 
following hold: 

(1) s„ = s„ = • • • = <S, = • • • 

(2) S A = S n — • * • — *z M 

(3) all are equal to one another 

( 4 ) all r 13 are equal to one another 

and we would expect 

( 5 ) all r j to have the same value 


leads to 


V'Lx'Zy — 


mMr a 


W7j 


(12.17) 
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Dividing both numerator and denominator by mM\ we find 


the radicals. Thus the correlation h»t 8 ™ ” nd 1 tlmes riJ under 

based on an infinite number of paraIleSs^ , Som« a ^ ,) ’ ^ 


= r 


y r ij\J r u 


(12.18) 


?a h r em r m «* 

any X, and T p Therefo/e, for mind Kfey fa r “ rreIatl0n betW6en 

^I = vw^ f 12 - 19 ) 

foSla’a S oS ht have been antiCipated ’ the c ™ io « attenuation, or 

are^iySld mSLTrfjSSeSkit^say X^A = ? T “ d Y * 

but now r, = rjj = and with’^L tn foE.“l2l^mes 


( 12 . 20 ) 


m + m( m -l)r xx FT(^1>^ (12 ' 20) 

Sprarman foL^flOl 7) lo T \h e r“r hT^ previ ° UsI y derived Br »wn- 

Sa “ 

with ^ set to 90 6 a y ’ Say> - 90 We Slmpl y S0lve ( 12 - 2 °) for 

m - i 9 °(l ~ r„) 
r xz(l - .90) 




Chapter 13 

FREQUENCY COMPARISON: 

CHI SQUARE 


The quantity chi square (% 2 ), defined m the last chapter as 

A, T? 


( 12 . 11 ) 


or as the sum of the squared discrepancies, between observed and expected 
frequencies, each divided by the expected frequency, is a statistic very 
useful in a variety of problems involving frequencies. Let us begin by an 
examination of what might be expected to happen if a penny were tossed 
100 times. The expected frequency for heads is 50 and for tails is a s . 
If for a particular series of tosses we secured 55 heads and 45 tqils, the 
discrepancies would be +5 and -5. When these discrepancies are squared 
each becomes +25, and dividing each squared discrepancy by the ^pected 
value we would have .5 + .5 = 1.0 as the value for x . Had we obtained 
40 heads and 60 tails, the discrepancies of -10 and +10, when squared 

and divided by E, would give 2 + 2 = 4 as f- 

Three things are readily apparent from the aforementioned first, the 
greater the discrepancy relative to E, the greater the contribution o % , 
second the two parts being summed to obtain % are not independent - 
when the absolute discrepancy for heads is known, that for tails can be 
inferred to be the same; and third, the squaring process means that * is 
always a positive quantity regardless of the direction of the discrepances^ 
A fourth fact becomes apparent if we recall what happens when a series of 
tosses is repeated. The number of heads (or tails) secured will vary from 
one series of 100 tosses to the next; hence the amount of discrepancy wi 1 
vary, and therefore the magnitude of will vary from senes to series. In 
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other words, successive sampling will yield varying values for If W e 
kne W the sampling distribution for ** we could specify the probability of 

ou d L? bT " lar§e " Va ' Ue " 0 “ * “5 thereby^we 

could judge whether a given amount of discrepancy is significantly Lee 

enough to warrant the conclusion that the coin is biased Y ® 

Situations similar to this arise in research work. We may on the basis 

of a hypothesis that a certain proportion of individuals possess a given 

charactens ic, state how many of a sample of N cases would be expected to 

“ will provide Ee' ved 

umber. If the hypothesis is tenable, the discrepancy between observed 

IfleTbt d H° Uld be n ° krger than ^ ht basis of cTance 

If the obtained discrepancy is too large, i.e., not apt to arise by chance the 

error'^ofT nroLoT ^ StUdent Wh ° reCaUs that * e stan ^rd 

error of a propoition can be used m comparing observed with expected 

proportions may wonder whether another technique is necessary 
answer will be forthcoming, ^ 

CHI SQUARE AND THE BINOMIAL DISTRIBUTION 

nh? rh 7 ! SOme illSight re S ardin 8 the sampling distribution of r* can be 
obtained by a re-examination of the binomial distribution, which was 

(p +'yo with i= ter i'i SUPP L e W£ C ° nSider the bin ° mial dis tnbution, 

hLrlc h J 2 as yielding the chance distribution of number of 

heads when 10 unbiased coins are tossed (see Table 13.1). When 10 coins 
are tossed we expect to get 5 heads and 5 tails, that is, the £s are 5 and 5, but 

Table 13.1. The binomial and x a when 10 coins are tossed 


Number of 
Heads 


/ forr 
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for a particular toss we will have an observed number of heads (and tails) 
which may differ from 5 and 5. The observed values, or Os, could be 10 
heads and zero tails; 9 heads, 1 tail; and so on to zero heads, 10 tails. 
If we obtained 9 heads and 1 tail, we could write r 2 = (9 - 5) 2 /5 
+ (1 - 5) 2 /5 == 6.4. Similarly, if we compute f for 10 heads and no tails 
we get a value of 10.0; for 8 heads and 2 tails we get 3.6; etc. Note that 
for each ^ 2 , ££ = SO = 10. 

The third column of Table 13.1 gives the values of f for various possible 
sets of observed frequencies for number of heads and tails. All the given 
numerical values of % 2 , except 0, appear twice: 9 tails and 1 head will 
obviously lead to the same as 9 heads and 1 tail. Now the probability 
of obtaining 9 heads and 1 tail is 10/1024 and the P for 1 head and 9 tails is 
also 10/1024 ; hence the P for obtaining a of 6.4 is 20/1024. Likewise, 
we may combine the appropriate binomially derived chance frequencies 
(f b ) so as to write the chance frequencies for the several % 2 values. These 
appear as the fourth column of the table. We have thus established the 
chance or probability distribution of % 2 for a specified coin-tossing situ¬ 
ation. A plot of these frequencies against the % 2 values will reveal a highly 
skewed distribution. 

The probability of a % 2 as large as 6.4 will be 20/1024 + 2/1024, or 
22/1024, a value which obviously represents the probability of a discrep¬ 
ancy, between O and E, as great as 4 in either direction (at least 9 heads or at 
least 9 tails). The P of 22/1024 involves 1 tail of the distribution of f values, 
but both tails of the binomial contribute thereto. This fact will need to 
be recalled below when we discuss one- vs. two-tailed tests of hypotheses. 

Before we leave Table 13.1, it might be well to point out a connection 
between % 2 and xja. Consider again an obtained frequency of 9 heads. 
If we express 9 as a deviation from the mean of the binomial, tip = 5, 
relative to the a of the binomial, Vnpq — 1.581, we have 4/1.581, which 
when squared gives 6.401 or the corresponding value of % 2 (within limits of 
rounding error). This agreement is not accidental; as will be seen shortly, 
under specifiable conditions % 2 = (^/cr) 2 . Another characteristic of y 2 is 
obvious from Table 13.1: for the 10-coin situation no values of % 2 other 
than those given can be obtained because the possible number of heads 
(and tails) is a discrete series. This lack of continuity imposes a restriction 
on the use of % 2 which will receive more attention as we proceed. 

The x 2 values in Table 13.1 are for possible discrepancies of observed 
frequencies from an expected frequency of 5 for a single toss of 10 coins. 
Suppose that we have, as shown in Table 13.2, an observed distribution of 
frequencies obtained by tossing 7 coins 1000 times, and that we wish to 
compare these observed frequencies with those expected on the basis of 
the binomial expansion. e are not concerned this time with a single toss 
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for which the expectation would be 3.5, but rather with the results expected 
when a large number of tosses are made. Note that both the E column and 
the O column sum to 1000 (or N) and that the (O — E )s sum to zero. The 
several contributions to % 2 are given in the last column, which sums to 
7.65, or the % 2 for the entire table. Two other series of 1000 tosses made by 
students in the author’s classes yielded % 2 values of 12.52 and 15.02. Two 
of these values for % 2 are larger than any of the values in Table 13.1, and 
one reason for this is the fact that more (O - Ef/E terms are being 
summed—8 such values instead of 2. Thus, the possible magnitude of a % 2 

Table 13.2. x 2 for discrepancies of expected and observed 
frequencies when 7 coins were tossed 1000 times 


Number of (O — E) 2 

Heads E O O - E E 


7 

8 

4 

-4 

2.00 

6 

55 

55 

0 

.00 

5 

164 

157 

-7 

.30 

4 

273 

283 

10 

.37 

3 

273 

267 

-6 

.13 

2 

164 

177 

13 

1.03 

1 

55 

45 

-10 

1.82 

0 

8 

12 

4 

2.00 

Sums 

1000 

1000 

0 

7.65 


(IV) 

(A0 


(x 2 ) 


would seem to be a function of two things: the size of the squared 
discrepancies (relative to their respective Es) and the number of categories 
or possibilities for discrepancy. Actually, the chance or sampling distri¬ 
bution of is only indirectly a function of the number of discrepancies; 
it is a direct function of the number of independent discrepancies or the 
degrees of freedom , which we shall next discuss. 


DEGREES OF FREEDOM 

We have seen that the x 2 of 6.4 in Table 13.1 involves two (O — E) 2 jE 
values: (9 — 5) 2 /5 and (1 — 5) 2 /5, or two discrepancies of exactly the 
same absolute magnitude. This means that the two discrepancies are not 
independent—as soon as one is calculated, the other can be written down 
at once without any further calculation; hence 1 degree of freedom exists. 
If we study the data of Table 13.2, we see that, since the discrepancies 
must sum to zero, all eight cannot be independent or vary freely. As soon 
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SSS^SSSsS 

of freedom (df) is 1 less than the number of categories. 

Table 13.3. x 2 and f° urfold taWe 

(Expected frequencies in parentheses) 

No Yes Totals 


Group 1 
Group 2 

Totals 


50 (40) 50 ( 60) 100 = N t 

70 (80) 130 (120) 200 = N 2 


120 

N n 


180 

AE 


300 = N 


The df for other situations in which the £ technique is applicable will 

assumption that ^^ case , we would expect that the 180 yeses 
hypothesis). If this; were the c s , ri ht . hand totals; likewise 

that the two top-row values must sum to iv!, uie : . 

to JV the left-hand column must sum to N n , and the next column 

=£m35£S£« 


HUNT LIBRARY 

CARNKIE-MELLON UNIVERSITY 
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ssi“ - ^ “r 6 “ d 

ri rr; io " ,for * 3 by 3 ,,b,e * re '»“»«f»r. 2 b» 2 t,m“ 

(2 - 1V 2 n , i ( f _ ^ ThUS f0r the fourfoId ‘able we have 
Jr , ,){2 ~ t) = 1> and for the 3 by 3 table 13 - l« _ it _ , 

w?H£=asi?i=s= 


SAMPLING DISTRIBUTION OF x 2 


sampling **’ ^ h™ ^ *° the 

mmmmm 

HSisssiilps 

ssi=ss=sii 

»f ?'>»'",« 2 ,“S? ?;'■ for 1 d , esree > h « 

invol.es an Se"/ ,'„i ii^-f “? 'T" 0 " f ” *' dl~oibu ti o„. 

very large number of t *k c f there 1S n ° ° ne dlstrlbu ti°n but a 

practical „„ t 

* can be drawn for various us with along the absdssa 


y ~¥^f m)W”- 2,/2 ^ 2/2 


m which T indicates the gamma function as defined in texts in advanced calculus. 






J 0 2 4 6 8 10 12 14 16 18 20 22 24 - 

Fig. 13.1. Chi square distributions for various d/s • X s along abscissa. 

and the ordinates as the y values obtained by the equation in the footnote. 
The area under each curve will be 1 unit, as in the unit normal curv* 
Figure 13.1 contains curves for seven different values of n or df, so dr f™. 
as to be comparable. Note that the shapes of these curves aqd their 
locations along the abscissa vary with n. 
g Fo r , = 1 or for 1 degree of freedom, the curve starts very high 
fstrictly speaking it is asymptotic to the ordinate and hence starts at 
infinity) and drops quite rapidly. For this curve the height or y value at 
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X - .16 IS .92 (not shown). At x 2 = - 0 i, the height is more than four 
times greater than .92. By the time we reach a x 2 0 f 1.00, the height is 
.242 (what x/a w..ue does this height correspond to when the unit normal 
curve is considered?). Then the curve trails off until, at y 2 = 6.25 the 
height is about .007. Regardless of n, the right-hand parts of the curves 
never reach the base line; i.e., they are asymptotic. If we think of the total 
area under any curve as unity, the area between ordinates erected at any 
two base-line points, or the area beyond any point, can be expressed as a 
proportion of the total. Thus, for n = 1, .99 of the area is beyond (to the 
right of) a x value of .000157, and only .05 is beyond 3.841. Stated 
lfferently, the probability of obtaining a x 2 value as large as 3.841 is .05- 
orx as arge as 6.635, P = .01; andtheP = .001 point is at ax 2 of 10.827 
These hold only for df= 1. 

The curve for n = 2 starts at a hught of .50 and then descends, but less 
rapidly than that for n = 1. It is readily seen that large values for y 2 occur 
more frequently when n = 2 than when n = 1. The P = .05 point is at 

i 9 ?*)!' 6 ;; ‘Ss P T°n bab m ity 0f obtainin S b y chance a * 2 value as great 
as 5.991 is .05. The .01 point is at 9.210, and the .001 point is at 

For n = 3, the distribution curve begins at zero height, rises sharply to a 
maximum (modal value) at / = 1, and then falls off so that the P = .01 
point is at x = 11.341. As n is taken larger and larger, the distributions 
become less and less skewed and move farther and farther to the right. The 
mean of a given distribution always corresponds to a y 2 equal to n and 
except for n = 1 the modal value is at a % 2 of n — 2 . 

The distributions of x 2 for varying ns are theoretical probability distri¬ 
butions. They may be interpreted as random sampling distributions, and 
by them we can judge the statistical significance of discrepancies. Their 
use is exactly analogous to testing the significance of the difference between 
means, which it will be recalled involves setting up the null hypothesis: if 
there is no real difference between two universe means, the DIS„ values for 
successive samples will form a normal curve with center at zero and with 
unit variance. If a found difference is 1.96 times its standard error, the null 
ypothesis becomes suspect; if 2.58 times its standard error, the hypo¬ 
thesis of no difference can fairly safely be rejected; if DIS n = 3.00 
rejection is more definitely indicated. These three zs, it will be recalled' 

correspond to the .05, the .01, and the .003 levels of significance, for two- 
tailed tests. 

Now x 2 can likewise be used to test the null hypothesis. The essential 
difference between the D/S d and the x 2 techniques is that the latter involves 
skewed probability distributions; but, knowing the distribution f or a given 
n, we can ascertain the necessary value of x 2 for the .05, the .01 the 001 or 
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other levels of significance. The statement of the null hypothesis in connec¬ 
tion with y 2 may vary slightly according to the given situation. If the fre¬ 
quencies in the universe agree with the a priori expected frequencies, if the 
frequencies in two or more universes are the same, if there is zero associa¬ 
tion in the universe between two classifications or variables—if any such 
conditions hold for the universe or universes, then successive samplings 
will yield y 2 values which will distribute themselves in a detennpable 
manner, thus permitting us to specify the probability of obtaining by 
chance a r 2 value as large as any given or obtained value. When this 
probability is small, say .01 or less, the null hypothesis is rejected, and its 
rejection implies that there are real discrepancies or real differences exist 

or there is a real association. . 

Since the random sampling distribution of £ depends on the df which 
varies from situation to situation, it is not feasible to give a rule-of-thum 
criterion in terms of the magnitude of £ which would be deemed signifi¬ 
cant If we adopt P = .01 as the level of significance we wish to attain, we 
need to refer to available tables of ** in order to find how large ** must be 
to correspond to this level; likewise for any other chosen level of signifi¬ 
cance. Probability tables for % 2 are available in two forms. Onq form, 
Fisher’s (see Table D of the Appendix), gives the values of X which will be 
exceeded by chance a specified number of times, such as .10, .05, .01, and 
.001. Elderton’s tablet gives the probabilities for obtaining chi squares as 
large as specified values expressed as integers, such as 1, 2, 3, • • •, 21, 22. 
Both tables include varying degrees of freedom. Because of an eai y 
erroneous notion as to the meaning of degrees of freedom, Elderton s table 
must be entered with df equal to 1 less than his n' values, e.g., use n = 4 
when n or df= 3. Elderton’s table has one advantage over that given m 
our Appendix: P values as small as .000001 can be a scertain ed. 

For «s larger than 30, the expression V 2% 2 - V2 n - 1 will have a 
sampling distribution which will follow very closely the unit normal curve. 
The probability is accordingly .05 that this expression will exceed. +1.64, 
and .01 that it will exceed +2.33, by chance. 

Before the applications of % 2 are summarized, a word should be said 
about the underlying assumptions which restrict its usage. In the^deriva- 
tion of the equation for the / distribution(s) it is assumed that the 
sample discrepancies, or distribution of observed from expected, follow a 
normal distribution. In practical applications this assumption can easily 
be violated in two senses: skewed distribution for O — £ values and lack 
of continuity. If E is small, say, equal to 2, the Os are restricted on 

t Table XII in Pearson, Karl, Tables for statisticians and biometricians , part I, Cam- 
bridge: Cambridge University Press, 1931. 
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one side of E to zero and 1, whereas on the other side the possible values 
may be 3, 4, 5, and upward. Such a curtailment ordinarily leads to a 
skewed distribution of the observed frequencies (if the other side were 
restricted to just 3 and 4, symmetry could exist). Now it is obvious that 
when E is small we have a greater degree of discontinuity, hence the 
sampling distribution of the observed frequencies (and therefore, of O — E 
values) will be discrete instead of continuous as required for the normal 
curve. Even for the situation involving a symmetrical distribution of 
Os about E, such as that in Table 13.1, the discontinuities in the possible 
observed £ 2 values are marked. But that situation was for df = 1, and, as 
in the approximation of binomial probabilities by use of the normal curve, 
a correction for continuity will better the approximation when the y 2 
curve is used to obtain the probability for as great a discrepancy as the 
observed one. Even so, with df = 1 and with the correction for continuity 
(correction formulas yet to be given herein), it is not safe to use z 2 if 
an E is less than 5. 

However, the effect of small Es in producing discontinuities is not 
as marked when df is 2 or more. Consider the dice-rolling situation. If 
we rolled 12 dice once we would have 2 as the expected frequency for 
each of the possible outcomes: 1, 2, 3, 4, 5, 6 spots. For successive 
rolls, or trials, we would have Os for each outcome that would vary from 
zero up to a possible (though unlikely) 12. In calculating y 2 for each 
trial we would be summing six (O - Ef/E, or (O - 2) 2 /2, ratios with each 
of the six involving marked discontinuity over successive trials. The df 
would, of course, be 5. But over a large number of successive trials, 
the possible combinations of O values per trial are so numerous that 
the calculated ^ 2 s, hence the sample y 2 s, will show relatively little dis¬ 
continuity. In Table 13.1 we saw, for a situation involving E = 5 and 
d f= 1, that r 2 can have just six values, whereas for the foregoing dice 
situation the summing of six (O — E) 2 /E ratios permits a large number of 
possible values for calculated ^ 2 s, or a greater approach to continuity. 
This is somewhat analogous to the approximation of the discontinuous 
binomial by the continuous normal distribution, which approximation 
becomes increasingly better as n increases (or as the n + 1 possible 
outcomes increase). 

Although discontinuity as an aspect of the violation of the assumption 
of normality for Os about small is is not serious when the df is more than 
5 or 6, there remains the question of the effect of possible skewness when 
Es are small. There is evidence that, when df is not small, Es as low as 
2 will not produce misleading y 2 values. 

A second assumption is that the observations be independent of one 
another. This assumption is violated when the total of the observed 
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frequencies exceeds the total number of persons in the sample(s). Such an 
inflation of N occurs when multiple observations are made on each person 
and each person is counted more than once (cf. p. 93). 


APPLICATIONS 

The chief situations for which it is permissible to use f may be classified 

m T The'discrepancy of observed frequencies from frequencies expected 
on the basis of some a priori principle. Such situations are most frequen y 
found in genetics, wherein it is hypothesized that certain crossings should 
lead to the presence, in a certain proportion of offspring, of some defined 
characteristic or variation thereof. The frequency table for such sdmt^ 
is i bv k with k - 1 degrees of freedom, since the only restriction is that 
the expected frequencies must sum to N. This type of situation does not 

arise often in research in the social sciences. , . , 

2. Contingency tables. Here we have two types of situations which 

differ only in the methods of classifying. 

a We may have a contingency table which is analogous to a correlation 
table in that both classifications are based on continuous or ordered 
discrete variables for which we have only categorized mformatior i for N 
individuals. The two variables might be in dichotomy (fourfold table), or 
one might be a dichotomy and the other manifold, or both might involve 
multiple categories. For these contingency tables it is meaningful to 
speak of the correlation between the two variables, and the degree o 
correlation might be appropriately specified by the tetrachonc r or the 
fourfold point r or the contingency coefficient (corrected or uncorrected), 
which measure is used depends upon meeting the requisite assumptions 
Insofar as we are concerned only with we have the means for testing t 
significance of the correlation or association as a chance departure from 
zero or no relationship, and the significance test can be used without 
knowledge of the degree of correlation. Such a test of significance 
sometimes spoken of as a test of independence-are the two classifications 
independent? If so, £ should be no larger than would arise by chance. If 
we have evidence for correlation or a lack of independence from the X 
technique, we can proceed to calculate an appropriate coefficient for meas¬ 
uring the degree of correlation or the strength of association. The student 
should, as an exercise, convince himself that *■ per se is not a measure of 

TttTother contingency-type situation involves classification into 
categories for one variable vs. classification into unordered groups for the 
otS! or one unordered grouping vs. another. The fundamental problem 
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is apt to be that of comparing two or more groups with regard to multiple 
responses; re., we want a test of the difference between groups rather than 
a measure of correlation, which would not be entirely meaningful except 
parti 6 ‘r SenSe th f a partlcuIar response is associated more often with^ 
ETs (I Previously stated, the #for a * by / contingency 

behev? tW 668 ° f fi f If We WiSh t0 ChSCk ° n Whether il is reasonable to 

samphni of th distribution is > withi « the limits of chance 

sampling, of the normal or some other specified type, a frequency curve 

having the same basic constants (e.g., N, M, and 5 for the normaf curve) 

to £e daTT I*® ° bSerVed freqUenCy distribution can be fitted 

o the data. If a normal curve is being fitted, the table of normal curve 

functions is used to set up the theoretical or expected frequencies for the 

ThTrf^T^ inter !f IS ' Th6n Can be computed in the usual manner 
The df will correspond to the number (*) of grouping intervals less the 

number of constants derived from the data and used in the fitting process 

to Igree as r to a iT 6 Md theoretical distributions are made 

o agree as to N M, and 5; hence df = k- 3. An attempt will be made 

later to explain the reasoning back of the determination of# when check- 
ing the goodness of fit of frequency curves 

Fourfold contingency tables. For illustrative purposes, let us first 

r fswdl L a the?o t° f 2 ^ 2 C °^ ingellcy tables for whicb 'he tetrachoric 
’ S the contln S enc y coefficient, is an appropriate measure of the 


Table 13 4. Setup for computing x 2 from a 
Tourtold table by means of a formula 

A + B 
C + D 

A -f" C JB -f- D y 


A 

B 

C 

D 


foTa'fourfoTmh! 011 ' ^ W ® d ° il might be wel1 to re call that f 

require > 'cfficulation oTffi by “ Simple formula which does 

quire calculation of the four expected frequencies. Let the fourfold 

equencies and marginal totals be set up as in Table 13.4. Chi square can 
be computed from square can 


/ = 


N(BC — AD)‘‘ 


(A + B)(C + D)(A + C)(B -)- D) (13T) 

This is simpler than calculation from the discrepancies between observed 
and expected frequencies. The requisite that no expected frequency shall 
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, i cc than 5 still holds A quick check on this can be obtained by multi 

0» •"« “f ^ 

smallest expected frequency. In Table u.o wm 

Table 13.5. X 2 applied to contingency (fourfold) tables 

, Item 3 

Item 1 


Item 2 


29 

39 

22 

10 


51 


49 

% 2 = 5.93 
P about .01 


68 

32 

100 


+ 


Item 4 


34 

37 

94 

35 


128 


72 


71 

129 

200 


= 12.40 
P less than .001 


tables for Stanford-Binet items. Direct substitution into formulaiM 

r r .. —" tr 

Table 13.6. x 2 used to test sex differences in passing ( +) or failing ( -) a Binet item 

. 7 8 9 

Age 6 

- + 


- 4 


_ + 


- + 


B 

84 

18 

102 

66 

36 

102 

58 

44 

G 

93 

8 

101 

80 

] 

20 

100 

62 

39 

r% ^ 


102 


101 


37 

66 

52 

49 


103 


101 


177 26 203 

4.30 

<.05 


146 56 202 120 83 203 89 115 204 


5.89 

<.02 


.43 

<.50 


5.02 

<.05 
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*• “T r h “ « 35 ' 

suggested by the fact that Knvc * dlffe rence may exist is also 
levels. This brings us to an ^ T SU PC™ at ail four age 

squares for independent (i.e. based on diff ° f *!' f he S6Veral chi 
summed to a total y 2 with dfe mini Crent sam P les ) tables may be 

being summed Thus for Table 13.6 we^S f 5^'t TTol 

* which we LTudVtSe 

it can beLertained^SSTJ 

STh^rjr Td 8 d°r wouid 

this item. ld be COnC,Uded that a real difference does exist for 

freq^^ k “P™ f " &U *“*»» where 

because of age orTther d fferf P$ T ** flrSt be le S itimatel y combined 
present amoS c^ISr IS T* USeful Whetl COnsistenc y * 
statistical significance However ” S ’. ?° ne ° talcen Sln gly possesses 

single comparisons constiLira ” : nor inagnificance for 

as an over-all test of sSS" n^ ' f ° r USmg the SUm ° f chi s q^s 
probability ° f S1 8 mficance <* as a means of arriving at one summary 

T 3 SUm ° f inde P ende » t * 2 s will 

# f» ,h, a b ” t™s b ■ sr s s„r; r of 

z 2 0 =s^_EA 2 and , 


— a £j 

If we added the two y 2 values to ffet v 2 — ^2 r 2 & , 

the same total y 2 as we would nht* Z L X a + X We wou ld have exactly 

the form (O -EWE alonv summm S al * the possible ratios of 

vu ^ a ^ a ) I u m° n g with those of the form (n zr\2ir- 

z,z"$ r a ,r»; “• “i 1 " wm “ ‘”*5 

added these ten ratios without firTcZldng 8 y 2 Td ^ ^ S ‘ mply 
precisely the same value as y 2 = y 2 + y3 8 C“ d * ?’ We wouId S et 
degrees of freedom for the v 2 jV + X W w m determining the 




j-j3] frequency comparison: chi square 223 

total number of independent deviations in the two sets combined But 
this total number is simply the sum of the dfs for the separate sets. Stated 
differently y 2 is not conscious of how it was computed, whether by irs 
getting partial sums which are then added or by proceeding directly to a 

total sum. 

Table 13.7. Schema for comparing groups via z 2 and via difference between 
proportions (or percentages) 

Frequencies 


Group 


Group 


+ 


A 

B 

C 

D 

A + C 

B + D 

Proportions 

+ 

— 

Pi = A/Ni 

= BIN, 

to 

11 

O 

q 2 = DjN 2 

p = (A + C)!N 

q = (B + D)\N 


A + B = N 1 
C + D = N 2 
N 

Pi + tfi — 1-0 
p 2 + q 2 = l <0 
p + q — 1.0 


The single age comparisons in the foregoing example could, of course, 
be made by means of proportions. This could be done by formula (5.6), 
the discussion of which should be reviewed at this time. Let us examine 
the connection between the f technique and the D\S d for proportions 
method of testing the significance of the difference between two groups, the 
individuals of which have been classified as either passing or failing saying 
either yes or no, possessing or not possessing a characteristic, etc. All such 
comparisons begin with a fourfold frequency table of the type symbolized 
in Table 13.4, or an equivalent (the frequencies may have been recorded for 
only one category of the dichotomy, say the yeses, from which the fre¬ 
quencies for the other category may be readily inferred by subtraction). 
Table 13.7 contains the basic table of frequencies for the presence.. (+) or 
absence (-) of a characteristic for groups 1 and 2, and the basic table of 
proportions obtained by dividing the frequencies by the proper As is 
indicated. Note that the p and q values on the bottom margin are the 
proportions to use in formula (5.6) for the standard error of the difference 
between p, and p 2 . Note also that p, = AjN,= A\(A + B) and that 
p 2 = C/A a = C/(C + D). 
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in order to avoid carrying along a square root sign or radical and for 
another reason which if not now obvious will soon become so let us write 

he twoT r eXpreSsion for the critical ra «° of the difference between 
the two proportions, p x and p 2 , thus, 


Hj _ (Pi - P ,) 2 
S *d M , pq 
N, N 2 


frpmi en . We TQ fl CQ a11 the P^Portions by their equivalents involving 
^ we n have a Pr ° Per ^ ^ a ’ S ° SUbstitUte frec l ue ncies for TV, and 


D 2 

c-2 
^ D 


Hi 

c-2 
^ D 


-- U I (A + B) - CKC + D)T _ 

+ C )/AH • [(B + D)/N~\ | [(A + C )/m • f(B + D)im 
A + B C + D 

(AC -f AD — AC — BCf 

____ [(*4 + B)(C + D) 7 _ 

(A + C)(B + D)(C + D) + (A + C)(B + D)(A + B) 

N\A + B)(C + D) 

__ (AD - B CfN 2 

{(A + B)(C + f)[U + C)(B + D)(C + D)) 

\ + (A + C)(B + D)(A + B)] / 

___ (AD — BCfN 2 

(A + B)(C + D)(A +C)(B + D)(A + B + C+ D) 

_ (AD — BCfN 

(A + B)(C + DXA + C)(B + D) 


wnicn equals x 


1 --- /u — uy iuuuum {1 3 .L) ior the tourfold table Thic 
confirms a fact already mentioned, that for 1 degree of freedom 5 is the 
same as the square of the critical ratio. Since formula (5.6) is applicable 

that y Sir^r? pr0p0rti0ns based independent samples, I follows 

table bv m n d riy res f^ ted - That ls , * 2 as computed from a fourfold 
table by (13 1) does not allow for any correlational factor which might be 

n"r f° d beCa r USe thetW ° gr ° Ups COnsist of P aired or matched individuals 

or for the correlational factor which would be present if Pl and (or the 

corresponding frequencies) were based on the same indMduah as in a 
pretest, intervening experience, posttest situation 
Significance of changes. The student should carefully note that 
a though the application of / to fourfold tables of frequences like that of 
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Table 5.2, which is here reproduced with minor changes as Table 13.8, 
provides a means of testing the significance of the association or correlation 
between two sets of responses, such an application does not test the sigm - 
cance of change from the first to the second set of responses This.'atter 
test can be made by means of formula (5.5). It is also possible to test the 
significance of any found change by the use of y 2 . To do this we first note 
that a net change for the group must necessarily involve the difference 
between the frequencies, A and D, since the B and C cases represent those 

Table 13.8. Fourfold table of frequencies and proportions for a first set vs. a 
second set of responses from the same individuals 

Proportions 
2nd 


Frequencies 

2nd 

+ 


1st 


A 

B 

C 

D 


A + C B + D 


A + B 
C + D 
N 


+ 


1st 


a 

b 

c 

d 


?2 


Pi 


Pi 

Hi 

1.0 


who showed no change. The null hypothesis would be that the qniverse 
frequencies are not different; i.e., for a given sample, A and D would differ 
only as a result of chance sampling. Since A + D represents the total 
number of individuals who changed (the .4s from + to -, and the £>s from 
_ to +) in setting up the null hypothesis concerning the net change it 
would seem appropriate to say that, if A + D individuals changed, 
(A + Z>)/2 would change in one direction and (A + D)l 2 in the other 
direction. Thus (A + D)/2 would become the expected frequency; then 
A- {A A- D)I2 and D - (A + £>)/2 would become the discrepancies 
between observed and expected (on the basis of the null hypothesis) 
frequencies If A = D, both discrepancies would become zero. Squaring 
each discrepancy and dividing by E and then summing the two quotients 
or doubling either one will give a y 2 which is based on 1 degree of freedom 
(why 1 degree of freedom?). A little algebraic manipulation shows that 

2 = (T ~_Pg (13.2) 

1 A + D 

for the particular situation in which we wish to test the significance of 

over-all changes. „ . A 

Comparison of formula (13.2) with formula (5.3) shows that we again 

have a y 2 , with 1 degree of freedom, which equals the square of an x/u. 
The reasoning back of the statement given on p. 56 that formulas (5.3), 
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low h p nd . (5 ' 5) r i“ ppli ? able Unless A + D e( l uals 10 or more should 
now be cleurer to the reader. If A + D were less than 10, the two & 

would be less than 5, an acceptable though none too conservative lower 

th ™ S Jj;, t COrrectl ° n ^ or continuity) needed when the Es are smaller 
than 10 will be given shortly. One thing which may puzzle the reader at 
this time is the fact that formula (13.2) does not contain a total TV. Its 
algebraic equivalent, ( Dja D f , with a D calculated by formula (5 5) does 

m “ «“ a PP a rent than ,S. 
The advantage of the ** over the D/o d technique for testing the signifi¬ 
cance of net changes in responses lies in the fact that / values for two or 
more groups which have been used in an experiment can be summed to a 

number W of h ^ l ° T- ^ ° f ^ Separate dfs; “ this case ” e q uals th ® 
number of chi squares being summed. 

Formula (13.2) is, of course, not restricted to situations involving 

changes m responses. If we have the same individuals giving, say, yes or 

*° ‘r q uestions and we desire to test the significance 

fornula H 3%V 6 b6t “ he fre q«cncies (or proportions) of yeses or noes, 

’ ( 3 ; H-ff apphcable - ° r su PP° se we wish to know whether there is 

Lmin t H / r nce m diffiCUlty 0f tW0 test items which have been 
and H d 7 I 1 ® Same Sr0Up - For exam P ,e > in Table 13.5 we have 49 
and 68 individuals passing items 1 and 2 respectively. Since TV= 100 

the proportions are .49 and .68 (or 49 and 68 per cent) By formula (13 2)’ 

f!hs b t = n 9 7, 10)2/(29 + I0) = 9 - 26 ’ which for 1 degree of feedom 
rn I 0 ?, ? 6 u the ,° and •° 01 leveIs of significance; hence, it would be 
concluded that the two items are different in difficulty. If we use 

f =T9 U / Ia 062 5 4 5 - TJ e \ Z : \ P \~ P ^ = ^ ~ •49)/V(T0 + ,29)/100 
fora 1“?® t0 the Same P robabilit y figure as that 

for aj of 9.26. Either method may be used. Both make due allowance 
for the correlation which is present because the frequencies or proportions 
being compared are based on the same individuals* P P 

1I . C T*“ f0r e ° r ntinuit - v - As Pointed out earlier, when df = 1 the 

tables * “ T*** T E “ ' 6SS than 5 ’ For fourfoM contingency 
tables, an allowance for discontinuity can be made by applying Yates’ 

correction, which should always be used when any one E is 5 to 10 A 

o^boffi 1 ofT Hkely t0 ,° CCU ^ 6ither Wh6n the t0tal Sma11 or *one 

or both of the marginal totals involve extreme dichotomies. It is easy to 

determine the smallest E by dividing the product of the two smaller 

Y “'’' co ™ ,io “ “ * ‘““k 


N{\BC -ADI - N/2) 2 
(A + B)(C + D)(A + C)(B + D ) 
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and indicates that the absolute difference between AD and BC is tp be 
reduced by JV/2. Formula (13.2) can also be written to include a correction 
for continuity. The corrected form 


,_ (\A-D\- l) 2 
A + D 


(13.4) 


involves decreasing the absolute value of the difference between A and D 
by 1. Formula (13.4) is to be preferred to (13.2) when A + D is less than 
20. The reasoning back of Yates’ correction is precisely the same as that 
given on p. 45 of Chapter 5. 

One-tailed vs. two-tailed test. It will be recalled from our discussion 
of the sampling distribution of % 2 that the Ps obtainable from Table P are 
the probabilities of the chance occurrence of as large a y 2 as that observed; 
that is, levels of significance such as P = .05 or .01 or .001 are based on one 
(the right-hand) tail of the sampling distribution of %; 2 . Does this mean 
that it is a one-tailed test in the hypothesis testing sense discussed earlier 
(pp. 61-63)? Let us recall a couple of facts. First, when using the bi¬ 
nomial to indicate something of the nature of the % 2 distribution we saw 
that both tails of the binomial were combined as one tail of the y 2 distribu¬ 
tion. Second, for 1 degree of freedom % 2 = {xja) 2 . Now an x/o of 1.96 
corresponds to the P = .05 level as a two-tailed test. The square of 1.96 
gives a y 2 of 3.84, which we can see from Table D also corresponds to the 
.05 level. Hence the Ps, for 1 degree of freedom, read from Table D are 
equivalent to those based on the two-tailed test despite the fact that only 
one tail of the y 2 distribution is involved. 

If the decision to be made or the hypothesis to be tested calls for a 
one-tailed test, the Ps from Table D need to be halved: a f of 5.41 
(instead of 6.64) is required for the .01 level, and a of 2.71 (instead of 
3.84) gives the .05 level. Incidentally, for 1 degree of freedom, a y 2 P 
obviously, can be obtained by entering its square root into the normal 
curve table—whether such a P from xja is based on one or both tails of the 
normal distribution depends on the hypothesis being tested. As we 
proceed, the student should convince himself that the notion of direction 
of differences, hence the idea of a one-tailed test, does not make sense in 
other applications of y 2 . 

Comparison of two or more correlated proportions. Formula (13.2) 
has been extended to provide a method for testing whether three or more 
nonindependent proportions (or sets of frequencies) differ significantly 
among themselves. For example, we may have pass-fail (or yes-no, or 
some other dichotomous) information on C items (or questions) for N 
individuals; or we may have only one item with responses from N persons 
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under C different conditions; or one item with responses from N sets of C 
matched persons each, that is, C matched groups. 

Data from such situations can be arranged in a table consisting of N 
rows and C columns. The total number of passes (yeses) in a given 
column divided by N will, of course, be the proportion of passes (or yeses) 
in that column. Do these C proportions (or the totals) differ significantly 
in an over-all sense? The null hypothesis is that all the proportions are the 
same except for chance. To test the null hypothesis we will need to obtain 
not only the column totals (number of passes) but also a similar total for 
each of the N rows. Let T stand for the total in any column and X stand 
for the total in any row. This X is a sort of “score” for the person—his 
number of passes (or yeses) on the C items. The sampling distribution of 
the quantity 

(c-i )[cst 2 - ( i:t) 2 ] 

ChX - sz 2 (13,5) 

follows the x 2 distribution with C - 1 degrees of freedom for N large 
(N > 30, presumably). 

The computation of Q is so easy that it need not be illustrated. If an 
obtained Q exceeds the % 2 required for a chosen level of significance, we 
conclude that the (correlated) proportions do differ in an over-all sense, 
that is, they are not homogeneous. 

Chi square for 2 by k tables. The calculation of % 2 from a table 
with two rows and k columns (or two columns and k rows) can be 
accomplished by way of expected cell frequencies calculated as previously 
suggested from the marginal totals or by means of 


N' 2 

A t B t 


B 2 


Ai + B i A t -\- BA 


(13.6) 


in which the As and Bs have the meanings indicated in Table 13.9, 
wherein will be found the frequencies for two groups classified according 
to five response categories. The necessary computations required by 
formula (13.6) are also included in the table. Note that, as usual, the 
marginal totals are first found by summing across and down. Column D 
is obtained by dividing the entries in column B by the adjacent values in 
column C, and column E results from multiplying the D column values by 
the B column figures. These same operations, when applied to the last (or 
totals) line, lead to the column E entry of 49.44, which is the value of the 
B 2 tl(A t + B t ) term in formula (13.6). Summing the first five figures in 
column E yields 50.83, or the S term of (13.6), and the difference between 
50.83 and 49.44 is 1.39, the value of the bracketed part of the formula. 
When this is multiplied by N 2 /A t B t , we have * 2 , which for a dfo f 4 yields a 
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Table 13.9. The calculation of x 2 fr° m a 2 by fe table: two groups 
and k (= 5) responses 


Col. A 

Col. B 

Col. C 

Col. D 

Col. E 

Group 



B; 

B% 

I 

II 

4- B t 

Ai + Bi 

Ai + Bi 

1 27 (= A l ) 

15 ( = Si) 

42 

.3571 

5.36 

2 26 ( — A 2 ) 

16 (=Bz) 

42 

.3810 

6.10 

3 247 (== A s ) 

110 (= B 3 ) 

357 

.3081 

33.89 

4 41 (= A t ) 

8 (=S 4 ) 

49 

.1633 

1,31 

5 39 ( — A 5 ) 

15 (=£ 6 ) 

54 

.2778 

4,17 





50.83 

Totals 380(=/4 f ) 4- 

164(= B t ) = 

544 (= AO 

.3015 

49,44 





1.39 

544 2 

1 

II 

4^ 

Ai 

c/i 

DO 

= (4.75X1.39) 

= 6.60 


(380)(164) 





n — 4, 

P = .16 




P of about .16. In other words, once in six trials differences as large as 
those in Table 13.9 would occur by chance; hence we have insufficient 
evidence for concluding that the universes from which these two samples 
were drawn differ in regard to their responses to the asked question. 

If we had to depend on the DIS d technique for testing the significance of 
the group differences in Table 13.9, five z ratios would result—fqr each 
category there is a possible difference in proportions or percentages with a 
standard error for each difference. The five zs might, and usually would, 
lead to five different P values with a consequent predicament as to inter¬ 
pretation. Offhand, it might be argued that, if any z so determined reached 
an acceptable level of significance, we would be justified in concluding that 
the difference between the groups was real rather than chance. That such 
an argument may be fallacious is well illustrated by the data of Table 13.9, 
which are actual data. When these data first came to the author’s attention, 
the table was in percentage form with a z worked out only for the category 
showing the largest difference. This z, based on formula (5.6), was 2.54, 
which is near the P = .01 level of significance, and it had accordingly been 
concluded that a real difference had been found. Now, when we consider 
the x 2 P of .16 for the over-all comparison, we are not justified in placing 

much confidence in such a conclusion. 

Why the apparent inconsistency between two tests of significance? 
Since most investigators are looking for group differences rather than 
group similarities, there is the tendency to single out a category for 
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comparison, not because of intrinsic a priori interest in that category but 
because it happens to yield the largest difference. By this a posteriori 
selection, there is a tendency to capitalize on differences which may be 
large mainly as a result of chance. A similar situation occurs when we have 
the means for several groups—the largest of the possible differences may 
be the largest partly or entirely as a result of chance. Thus the use of x 2 
for such situations as are exemplified in Table 13.9 not only provides an 
over-all single index of significance, but also helps us avoid false con¬ 
clusions. 

Application to k by l tables. Consider the data of Table 13.10, which 
contains a contingency-type table involving three groups and three possible 

Table 13.10. Table of frequency of three possible responses for three groups of 
individuals—percentages in the parentheses add downward to 100* 


Motivation of 
Conscientious 
Objectors 


Group 


Total 

I 

II 

III 

Not cowards 

24(27.0) 

56(53.8) 

71(69.6) 

151 

Partly cowards 

30(33.7) 

23(22.1) 

19(18.6) 

72 

Cowards 

35(39.3) 

25(24.0) 

12(11.8) 

72 

TVS 

89(100.0) 

104(99.9) 

102(100.0) 

295 


* Data from Leo Crespi, J. Psychol. , 1945, 19, p. 285. 


opinion responses. To test the significance of the differences between the 
groups by use of the D{S d technique would involve comparing the per¬ 
centages for group I vs. IT, I vs. ITI, and II vs. Ill, for each of the three 
responses a total of nine zs. Straightforward computation gives x 2 
= 36.58, which for df= 4 is double the value of the y 2 needed for the 
P = .001 point. From Elderton’s table we find that P is about .000001; 
hence Table 13.10 as a whole exhibits highly significant differences between 
the groups. 

Perhaps a better understanding of the extent of the differences can be had 
by considering the percentages given in parentheses in the table. Member¬ 
ship in group III means a greater tendency to the “not cowards” response. 
Group I tends more to give the “cowards” response. Now it happens that 
the three groups, I, II, and III, can be (and are) placed in an ordered series 
for amount of education: grammar school, high school, and college 
respectively. Thus the association shown in the table is in the direction of 
less disparagement of conscientious objectors by those in the higher 
educational level. The strength of association or degree of correlation is 
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represented by a contingency coefficient of .33, which may seem rather 
low in light of the highly significant y 2 P. This illustrates a point which 
most readers will already have grasped: high statistical significance and a 
high degree of association are far from synonymous. Consideration of the 
data of Table 13.10 readily indicates the difficulty of predicting responses 
when the extent of association is represented by a C of .33. 

As in the 2 by k table, so here it is better to calculate an over-all y 2 
before examining by the 2 technique any of the possible separate compari¬ 
sons. Unless the y 2 P is significant, it is unwise to proceed with such 
comparisons. 

The calculation of y 2 for a k by / table is greatly facilitated by the follow¬ 
ing procedure. Let the observed frequencies be represented by fs as in 
Table 13.11. Divide the square of each cell frequency by N c , the total N 
for its column; sum these quotients across, one sum for each row; divide 
each of these sums by the respective total row frequency, n r ; add these 
quotients, deduct 1, and multiply by the grand total N. The result is y 2 . 
The first set of quotients (kl in number) should be carried to two decimals, 
and the second set of quotients should be carried to three decimals. The 
computational process is given in symbols in Table 13.11. The first 
subscript to/indicates the row and the second the column. 2 means sum 
with c taking on in turn the column designations, 1, 2, 3. 


Table 13.11. Schema for calculating x 2 from a k by l table 



1 

2 

3 

Total 

1 

/n 

AA 1 

/12 

/ 2 12/A r 2 

,/l3 

,/ A3 

»i 

?Ar/A 

2 

A 

Ai/A 

A 

AA 

/as 

PM 

«2 

?A«/A 

3 

A 

AA 

A 

AA 

/33 

/ 2 33/Ah 

«3 

SAc/A 

Total 

Ah 

Ah 

A 

N 


* 


2 


= N 


ZpiJNc S f\}N c Zf\ c (N c 

c c c 


- H--1-— 1 

«1 « 2 «3 


Goodness of fit. The use of y 2 in testing the goodness of fit of a 
theoretical curve to an observed frequency distribution is illustrated in 
Table 13.12. We start with an actual distribution, usually with more 
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grouping intervals than in our example, and the descriptive statistical 
measures therefor. In fitting the normal curve to the distribution of 
Table 13.12, we need N, M, and S. To set up for each interval the frequency 
which would hold for the best-fitting normal curve, we go through the 

Table 13.12. Goodness of fit of normal curve to Stanford-Binet 
IQs, form M 

Proportionate 

IQ O xjS Area E O - E (O - EfjE 

160 3\ 

150 13/16 

140 55 

130 120 

120 330 

110 610 

100 719 

90 592 

80 338 

70 130 

60 48 

50 7^ 12 

40 4 

30 _lJ 

2970 = N .9999 2970 0 17.02 = r 2 

M = 104.56 df = 11-3=8; P = .03 
S = 16.99 

tedious process of determining the proportionate area under the theoretical 
curve for each interval. Once the proportions are known, each is multiplied 
by N to secure the expected frequencies. The proportions are ascertained 
by calculating the xjS value of the boundary limits of the intervals. For 
example, the 110-119 interval may be thought of as running from 109.5 to 


.0041 12 4 1.33 

2.645 

.0158 47 8 1.36 

2.057 

.0512 152 -32 6.74 

1.468 

.1186 352 -22 1.38 

.879 

.1958 582 28 1.35 

.291 

.2316 688 31 1.40 

-.298 

.1950 579 13 .29 

-.886 

.1177 350 -12 .41 

-1.475 

.0506 150 -20 2.67 

-2.064 

.0155 46 2 .09 

-2.652 

.0040 12 0 .00 
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119 5 since IQs are rounded to the nearest integer. Then (J09.5 
_! 9 104 56)/16.99 = .2907 as the xjS for the lower hmlt > “ d ( ” 9 ' 

- 104.56)/16.99 = .8793 as the ,/S for th ® 29 interval 

interval. J course, X.Tthe same as 10/16.99 or 

/fiS^which is Ute htterval width expressed in xjS units. Adding .5886 once 
/ 9Q07 aives 879 (it is sufficient to retain three decimals); adding it twice 

*7i 

^ P + n( i fU e re at top and bottom are the number of cases expected 

IIhssshsss 

*JS3?SS3SHsS 

distribution with which we started. _ 3 

Straightforward calculation gives a £ of 17.02. Witn aj 
InSoflntervals minus the number of constants used in the fitting) 

P = 03 • i e only 3 times in 100 would as large a Z 2 arise by cl jance or 
only 3 times in 100 would we get a worse fit if the universe * 

distribute^ in tte normal imjvefasluon. , thri 

» cL be coneluded .h,,.he otarvrf 

nurtures therefrom. Note that a smaller value for % for the examp 
of Table 13 12 would not prove that the universe is normal even thoug 

so-cafied ^nglouldleS to a worse fit only once 

fn 100 times; when P = .99, it is said that chance sampling would lead to 
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0 ?^h?h Onl rt On “ “ I0 ° dmeS ' In ° ther words ’ if P is between -05 and 
. , the hypothesis that the universe distribution is of the normal type for 

whatever type was fitted) is questionable; if P is .01 or less, this hypothesis 

irt,’ 9 9 'or CtWeen ‘ 95 u^m 99 ’ ^ may SUSpeC * * he fi * aS bein S t0 ° 
good, if p is .99 or more, we should definitely look for an error in calcula 

tion or for some type of restraint on the operation of chance. Too good a 

fit is as open to question as too poor a fit. If P is between .05 and 95 the 

fit is said to be satisfactory. > 

When the goodness of fit of frequency curves is being tested the df 
depends on the number of grouping intervals and on fhe tumW of 
restrictions imposed or the ways in which the expected distribution is made 
agree with the observed distribution. The general principle behind the 
determination of dfior X as a test of fit may be illustrated for the case of 
tes ing the goodness of fit of the normal curve. The expected and observed 

we 1 hf e T S 316 made t0 agree Wi ‘ h reSpeCt t0 N ’ M ’ and Suppose that 
we have k grouping intervals and that we let / stand for the frequency in 

the ith interval and X t for its score value (midpoint), and that *. represents 
the corresponding deviation score value for this midpoint.’ Then the 
following equations will hold: 1 men me 

fi +/a +/ 3 + •••+/, + ... + / = N 

rf 1 + / 2 f 2 +/A + "' +f* x ‘ + ■■■ +/A = NM 

h x l+h* 2 +/ 3 * 2 3 + • • • +y>2. + • . . +f k x% = NS* 

thefchftfthef 7 lueswe, : e known exce P t/i./*, and /„ those parts to 
the right of the/, term in the first of these equations could be added 

numerically. The resulting sum could be shifted to the right of the equality 
sign and then combined numerically with * giving a ! equinTf S 
TP /1 +/2 +/3 A, where A equals N minus the sum of all the fre¬ 
quencies save the first three. Likewise, the parts beyond the / 3 te™ h, each 
of the other two equations could be summed numerically, shifted to the 

" ,h “ f » ““o »•> 

/,« XSlf le,d “ ,h '“ A .n<l 

/l +/2 +/ 3 = A 
fi^i + f‘^2 + ./3^3 = B (say) 
fi x \ +/ 3 ^ 2 3 = C (say) 

It is a well-known principle of algebra that three equations in three un¬ 
knowns will be satisfied (if solvable) by just one set of values for the 
unknowns. For our particular problem, this means that, as soon as the 
equencies for all but three (any three) intervals are known, these three 
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remaining frequencies are not “free to vary;” they are fixed because of the 
requirements that the frequencies or functions thereof must add to , , 

and NS 2 . We accordingly lose 3 degrees of freedom, and therefore when 
we are testing the fit of a normal curve to a distribution with k intervals, 

th Although the Kolmogorov-Smirnov (K-S) test does not involve x 2 > we 
include it here since it also provides a test of goodness of fit. The procedure 
is relatively simple. The k observed frequencies are converted to cumula¬ 
tive frequencies, which are divided by IV to secure cumulative proportions. 
For the given M and S , the proportions per interval expected on the basis 
of the normal curve are calculated (e.g., the proportionate area column of 
Table 13.12) then cumulated. We thus have two sets of cumulative 
proportions. The k pairs of values are examined to find the largest pair 
difference, D; that is, the largest discrepancy between observed and ex¬ 
pected. (This D is, of course, in proportion, not percentage, units.) 

The sampling distribution of D is such that for N grea er an . , 

must reach: _ 

1.14/V N for significance at the P = .10 level 

1.36/ViV for significance at the P = .05 level 
1.63 jVN for significance at the P = .01 level 

The advantages of the K-S test over the x 2 test for goodness of fit are 
twofold: the K-S test is applicable for N smaller, and it is a more powerful 
test than the x 2 test. The latter advantage means that departure from nor¬ 
mal form is more apt to be detected by the K-S test. Stated differently, 
compared to the £ test the K-S test is less apt to mislead us into accepting 

the hypothesis of normality of distribution. , 

Although the method of fitting set forth in Table 13.12 should aid in 
comprehending the meaning of goodness of fit—an observed 
contrasted with an expected frequency obtained via the concept of are 
under a normal curve for the interval -there is a computationally shorter 
method for calculating the E values. The xjS value of the midpoint of 
each interval is determined, and then the ordinate (or y value) for each 
xlS value is written down from Table A. The products, each y times iNIS 
provide the series of £s for the intervals. For the K-S test, y times «/S will 
yield the expected proportions, which are then cumulated. 
y The y 2 test can be used to test the difference between two observed 

distributions (this becomes a 2 by k situation, with k - 1 the ^irint 
the K-S test has been extended for the same purpose. But f P . 8 

two observed distributions, both the Z 2 test and the K-S test have serious 
drawbacks: significance may reflect difference m location parameters, 
or in variances, or in distribution shape, or in any combination of these. 
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In this chapter we have discussed the essential nature of v 2 and have 

advanfa ° U ‘ 77’ a PP lications - B y the student should appreciate the 

the usfff 8 ° Ver perC f ta S e comparisons and have some Light into 
the use of x 2 as a means of testing hypotheses. 6 

EXACT OR DIRECT PROBABILITIES 

The * 2 Ps obtainable from Table D are approximations in that areas 
under a continuous curve are taken as estimates of values which form a 
pomt distribution. Even with Yates’ correction for continuity Le 
approximate is none too good when lvalues are less than 5. ThisLises 
die question as to the criterion forjudging the closeness of such approxi- 
mations, and the answer is that for situations involving 1 degree of free- 
om it is possible to specify exact probabilities. How? 

First, consider the problem of deciding on the basis of a specified 
number of successes whether a chap can distinguish between two cLrette 
brands. We learned in Chapter 5 that the exact P for the probabiS of as 
many correct indentifications can be obtained by the binomial distribution • 

annro ^ "7 n °* USe * he n0rmal curve or the r approximation. But such 
, PP ? mat J 0nS n0t 0nI >' are ver y convenient computationally for N (or ri) 

deri^d f o a m 0 th a e e h aCCUrat t en0Ugh ' ^ Ch6Cking 3 * 2 P a « ainst an P 
ed from the binomial, we must bear in mind the possibility of con 

regard. 0 " 6 " * W0 ' tailed tests ’' both methods should be alike in this 

Second consider the * 2 test of the significance of change (or difference 
between two correlated frequencies or proportions) given by formula 
(13.2). We can obtain an exact P for this situation by resorting to the 
binomial (see p. 54). Again, in calculating the binomial P, we must give 
consideration to whether we had intended a one-tailed or a two-tailed test 
in t d ’ COn ® lder the fourfold table for which formula (13.1) is appropriate 
difference bet" *7^^ of “ion or the significance oTth 

etSvanL I7r n ° S I° UpS ' F ° r this situatio " *e binomial is not 

theL PP lcab i e (6XCept When the fre q uetlc <es are equal on one, or both of 
nror argms) , ExaCt Ps can be obta ined for such tables by a rather tedious 
procedure which we shall now describe. It can be shown that the prob- 

fixefmI 0 rgins I i artlC S " ° f freqUendes ’ ^ * C, and D, for 

P = (T + B)\(C + D) ! (A + C)! (B + D )! 

N\A\ B\ C! O! 

th^pTfoLl! C °, mPa f r r bIe t0 ^ ^ si S nifica "ce test, we would also need 
he Ps for all sets of frequencies deviating farther than the observed set 
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from the null values of no association. This can be made clearer by an 
example. Table 13.13 shows an observed set (part I) and sets showing 
higher association (parts II and III). Note that each part ^ derived from 
the preceding part by subtracting 1 from both A and D and addi g 
both B and C. This process is continued until A or Dor both become zero. 
Note that the marginal frequencies remain the same. 

Table 13.13. Series of fourfold table frequencies required for calculating P 

directly and exactly 


I 

+ 


II 


III 


+ 


+ 


+ 

3 

9 

12 

+ 

2 

10 

12 

+ 

1 

11 

— 

6 ' 

2 

8 


7 

1 

8 

- 

8 

0 


9 

11 

20 


9 

11 

20 


9 

11 


12 

8 


Application of the foregoing formula to each table in turn will yield 
the probability for each set of frequencies, and the sum of these Ps will be 
the probability of as great association (in the given direction) as that 
indicated by the starting (observed) set of frequencies. We have 


(12!)(8!)(lt!)(9!)_ = ml 
T (20!)(3!)(9!)(6!)(2!) 


II 


III 


= (12!)(8!)(11 !)(9!) = Q(m 

(20 !)( 2!)(10 !)( 7 !)( I !) 

= (12 !)(81)(11 !)(9!) = 0Q01 

(20 !)(1 !)(l 10(8 !)( 0 !) 


The sum of these separate probabilities gives P = .0399, or .04 (to two 
decimals) as the probability of obtaining sets as extreme (in one direction) 
as the set observed in part I of Table 13.13. If the situation calls for a 
two-tail test, the mere doubling of the calculated one-tailed P will give the 
exact probability of as large a difference (or as great an association) 
irrespective of direction only when the marginal frequencies are identical, 
that is, in a setup such as Table 13.4, only when A + B = B + T'¬ 
other wise, exactly the same degree of association in the opposite direction 
cannot occur, and the doubling of the one-tailed P will only approximate 
the required two-tailed P. Consider again the left-hand fourfold table in 
Table 13.13. Association in the opposite direction would call forarpajority 
of the cases in the upper-left and lower-right cells, but how many cases for 
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as great a negative association as the observed positive association? One 
C , nte J l ° n ^°“ ld kf t0 sa y that for the negative association the frequencies 
Y’r’ C ’ D should be such that, with the margins unchanged, the value of 
- ~ AD) must ec l ual (9 x 6 - 3 x 2) or 48 except for having the 
opposite sign. [Note: the value of {BC — AD) enters into both the 

tourlold point r and the contingency coefficient for the 2 x 2 table 1 If we 
try ,J 

8 4 12 
1 7 8 

9 11 

we get (BC — AD) = —52, which would indicate a greater degree of 
association than the 48. If we try 7 for A , we would have (BC - AD) 

fU ^ ao 7 X ^ ~ which leads to a lesser degree of association 
than the 48. It is simply impossible with nonsymmetrical marginal 
frequencies to have the same degree of negative as positive association. 

Table 13.14. Sets of frequencies for negative association as 
strong as the positive association in I of Table 13.13 

I 


II 


8 

4 

1 

7 


11 


12 

8 

20 


+ 


9 

3 

0 

8 


11 


12 


20 


The best that we can do to obtain a two-tailed P is to consider all 
possible sets of frequencies that give rise to negative association as great as 
the observed positive association. For this, only the two tables given in 
Table 13.14 qualify. For I of Table 13.14 we have 


D 12! 8! 11! 9! 

P T = --= .0224 


and for II we have 


Prr = 


20! 8! 4! 1! 7! 
12 ! 8 ! 11 ! 9 ! 


= .0026 


20! 9! 3! 0! 8! 

SUmmed giVe a P ° f - 025a When this is adde d to the P of 
•Ojyy tor the positive association we get a P of .0649 as the P for the two- 

tailed test, a value which does not correspond to either of the one-tailed Ps 
doubled. 

The argument regarding the effect of asymmetry on the reciprocity of 
one- and two-tailed ft also holds for those situations involving fairly 
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sizable Ns when either or both margins of the fourfold table depart 
markedly from a 50-50 split. If the smallest E is less than say, 10, the % 
P (even with Yates’ correction) when halved does not lead to an entirely 
valid one-tailed P value. 

The computation of the separate Ps, laborious even with an ordinary 
table of logarithms, is greatly facilitated by a table of the logarithms o 
factorials, such as Table XLIX of Part I of Pearson’s Tables for statisticians 
andbiometricians. For Ns up to 28, special tables are available (see Table l 
of S, Siegel’s Nonparametric statistics, New York: McGraw-Hill, 1956) 
for judging whether the exact probabilities reach certain commonly used 
levels of significance. 




Chapter 14 
INFERENCES ABOUT 
VARIABILITIES 


We now return to the problem of statistical inference based on measures 
for continuous vanables. This chapter is concerned with inference 
regarding variances and differences between variances, and presents a basic 
theoretical distribution which will serve in later chapters when we again 
discuss tests based on means. g 

• F T’ f u greS T n fr ° m ° Ur main interest in this chapter to present a 
simple algebraic development of the standard error of the meanformula 

which will make it unnecessary to “take on faith” a part of the next 
derivation concerning variance estimation. Let X (the symbol which we 
will use henceforth) represent a sample mean. Suppose a random sample 
of size N and that we pull R successive samples (with replacements if Ris 
arge and the population size is finite). We can arrange each set of N 
scores to form a row, with random assignment to the 1st 2nd • • • Mh 
position. We will have * rows and we can regard the posh’ions as formfeJ 
columns, Win number. Our table will consist of R rows and TV columns of 
scores with no particular meaning attached to the columns because 
assignment thereto is a chance matter. With r standing for the ,-th row 
and c for the cth column, we will have a table like Table 14 1 
Summing across columns leads to a 2X (= NX) for each row These 
row sums will, of course, vary from row to row (we Jong ago learned that 
means vary from sample to sample). Each of the R sums! the result of 

~ 8 £ SC °?f an ^, , thuS may be re S arde d as a sort of “total score” for 
the row obtained b y addmg N partS; each column contributing a Qr a 

omponent. With variation within each column, this summing is as though 
W vanables were being summed to get a total score. Since the assignment 

240 
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to columns depends on chance, there will be no correlation between 
columns when R is infinitely large; hence the sums for rows will have a 
variance given by the well-known variance theorem for a.sum:. 

(j2 r EZ = ° 2 NX r = U 2 X + U 2 2 + • * * + C 2 c + ‘ ‘ * + U 2 iV 

We have used the symbol cr instead of S because, when we regard R as 
infinitely large, we are dealing with theoretical rather than observed 
variances. Under the condition of infinite R , all columns will have the 

Table 14.1. Score layout for R replications 
(successive samples) of size N 



1 

2 • 


• N 


1 

X 

X 

X 

X 

t SX = NX t 

2 

X 

X 

X 

X 

2 sx - nx 2 

r 

X 

X 

X 

X 

r £X = NX r 

R 

X 

X 

X 

X 

R ^x = nx r 


same variance, cr 2 a , and their sum will simply be No 2 x . That is, o 2 NX 
— Na 2 x . But this is the variance of N times the row means; if we divide 
this variance by N 2 we have the variance of the means themselves. (If this 
last step is not immediately clear, recall that for any variable, Y, we have 
o ay = aa y , where a is a positive constant.) Thus 

o 2 Nx r _ No 2 x = a\ 

N 2 N 2 N 

the square root of which is crj' V N, the familiar formula for the standard 
error of the mean. In practice, we need to estimate a x ; by S x if the 
sample is large and by s x if small. No claim is made that this rather simple 
derivation would satisfy the step-by-step rigor required by contemporary 
mathematical statisticians. 

Estimation of variance. To show that s 2 = J2v 2 j(N — 1) is an unbiased 
estimate of <r 2 , we need to show that the average of a very large (infinite) 
number of .s 2 values is a 2 . Such a mean value is called an expected value. 
If the expected value of a measure corresponds to the population value, 
the. measure is said to be an unbiased estimate. 

Suppose v 2 r is the estimate based on the rth sample and that we let 
x — a; _ X r and x' r = X r - fi be deviation scores from the sample 
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mean, X r , and the population mean, respectively. If we subtract x 
from x' r we have J 

X 'r ~ X r = ~ fl) - (X f - X r ) = ^ 

hence 

+ (X r - fi) and ;r' 2 r = [x r + (X r - fi)f 

Then 

Za;' 2 r = 2* 2 , + 2(J, - ^ + 2(X r - fi) 2*, 

Now 2*, = 0 and X(X r — p ) 2 = N(X r — ft) 2 , thus leading to 

2*'2 r = + N(X r - »2 

or 

S* 2 , = 2*' 2 , - N(X r - ftf 
This permits us to write 

s 2 = 2 4 == Ss /2 f _ N(X r - fif 
N - 1 2V — 1 N - 1 

Suppose we have R replications (samples) and that we average the R 
estimates: 


^ 2 2 

y,,'2 

z* r 

N(X r - fi) 2 


N - 1 

N - 1 

R 

R 

R 

S 

X*' 2 , 



N - 1 

N 2 {X r -fif 


R N 

- 1 R 

= _ 

N 22*' 2 , 

N 2 (X r -fi? 

N 

- 1 NR 

N- 1 R 


Now as R becomes infinitely large (or large enough to exhaust the 
population, if finite in size), the first term will involve the sum of the 
squares of all the scores in the population about the population mean, and 
hence will be the population variance, a 2 , multiplied by the N/(N — 1) 
factor. With R infinitely large, the second term yields N/(N - 1) times 
the variance of an infinite number of sample means about the population 
mean, or the sampling variance of means, which is o 2 /N. Thus, this term 
becomes 




j!4] inferences about variabilities 243 

Factoring out a 2 j(N - 1), the mean or expected value of j 2 , becomes 

(N - 1) = a 2 


N - 1 


and therefore , 2 is an unbiased estimate of the population van “ l ' e - 

If the student follows through the foregoing development with A instead 
0 f n - 1 as the divisor, he will discover that S 2 = 2* IN is a biased 

estimator. 


Variance and x‘ 


The student who peruses the statistical literature 

o /AT 1\„2/^2 _ V.^2/^2 


+ 


(0, - nqf 


4”n“t;7 «£e relationship, Z 2 = W = <* - Df ^ 
Since 7 2 is a random value varying from sample to sample, this^implies 
that the random sampling distribution of S 2 and of, 2 .^s related t-he 
random sampling distribution of % 2 . We will now attempt to bull p 

connection between the two. , 

Consider the binomial situation with n elements (n coins, n dice, etc.), 
with p the probability of success on a single element The mean number 
of successes, np, may be regarded as the expected frequency_ o 
and the mean number of failures, nq, may be regarded as the expected 
frequency of failure. A trial toss (or roll) will lead to an observed numb 
(frequency) of successes, 0„ and an observed frequency of fall “ res = W 
(If, e.g., 8 of 10 coins show heads, O s = 8 and O f — 2, O s + O f •) 

We have 

„ .. (O - Ef ( 0 S - np) 2 

= 2a - - == “- - : 

A E np 

which may be rewritten as 

2 = 4(0, - npf + p{0f - nqf 
1 npq npq 

But for this y 2 with 1 degree of freedom the numerical (absolute) value 
of (O - nq) = (O, - np); that is, the discrepancy is the same except for 
sign, which sign disappears in the squaring. We may, Ih^ore, r?ptace 
the (Of - nq) 2 of the second term by its exact equivalent, (O s - np) , thus 

81V1118 2 = q(O s ~ np) 2 + P (Os ~ np? = (0 S - npf(q + P) 

X ~ npq npq 

Since? + p = 1, we get 

2 (0 S - np)" 

r = 


nq 


f) 

npq 


npq 


The (o - np) is the deviation of a random value from a (theoretical) 
mean valJe, hence may be regarded as an *; and npq = a 2 is nothing more 
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T- ZZXSJ&SS* 

X 2 when the dfi s 1. P y hud m another context for 

indTvTdualZ P Xcy. th Vl1h° in \ Seneral bin ° mial SitUati0n we have 

have two O values and the corresponding 8 ^ Sh^The *£ ** W ° U ' d 
could, according to the nr P pJ g alues ‘ The ^ for A s outcome 

represent his frequency of succe^As" 8 ™' 111, ** 0ettin S ^ 


„2 


and 


= ( P-< - wp) 2 = 


'W 


A 


_ (°B — npV 


_ B 


npq 


X 2 n 


r 2 

X v 


(On — np) 2 
npq a i x 

wifh 7=T S the f luZTS; A 


;2f = J,^= 2 # 


2x 2 


degrees of freedom, the vll^ 0 f al^T^l *"? Utl ° n ^ * 

distribution with N df But this Ex 2 d iJ f ls ° follow the chl square 
*S here used are deviations rom a th ^et "7 6 ^ “ that the 

from an observed mean Sinre th th tlcaI mean Instead of (as usual) 

for ‘his J* is N. For the usual S-« LeT-^ * e 
N — 1. With df — AT 1 ^ — a — x, and the ^ is 

situation the d/undated »«h 2."/“ if,,;" '^ r '° T ““ “”* 1 

r,“ ^ * V- ■ ™ '.n- 

= Stra/iv and s' 2 - 2* 2 /f/V ’ ,? y ve the two vanance estimates, S 2 
= (* - 1>”, and thMefor^ " ^ ™ ~ that ** = ^ 


2* 2 NS 2 


(iV - l)s s 

9 

0*“ 


c (14J) 
with df = jV — 1. That is, both NS 2 la 2 and t/V it 2 / u 

we drop the subscript to a.) ’ ’ ^ " P6rtain t0 the Same variable ’ X, 
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Inference based on a single variance. The relationship set forth in 
U4 1) permits us to use either S 2 or v 2 as the basis for testing a hypothesis 
about a population variance and also for establishing confidence limits for 
a i Although situations rarely arise in psychological research where logi 
? 'a? „ ° hlo hesis regarding <r 2 , we will illustrate the procedure, with 

”-r“ d ,h ' m °" 

important task of setting a confidence interval for a . 

Suppose an S 2 of 100 or an s 2 of 105 based on N = 21 and the hypo 
thesif that cr = 16, or <r 2 = 256. If the hypothesis is true, we have 


,x 2 (21)(100) _ (21 - 1)(105) = g 21 

“ _ r\ r s 


256 


256 


with N _ 1 = 20 degrees of freedom. When we turn to Table D we find 
2 L AoctatJ with , *■ of 8.21. if- 20 1. .99, 
interpreted by »ying that » l' gre.rer than 8.21 , |f 

bv chance- hence, a value as small as 8.21 has a P of .01. 1 hat .is 
J = 256 is true, then only .01 of the time would we get a sample S as lo w 
as *100 or a sample v 2 as low as 105. Accordingly, we would reject the 

hypothesis at the .01 level of significance 

Next let us suppose the sample S 2 is 457.96 or s;is480.86. Th 
ives 2 = (21)(457 P 96)/256 = (21 - l)(480.86)/256 = 37.57 which falls 
ft the P = 01 P° int - probability of an S 2 as great as 457.96, or an 2 
as great as 480 86, as a deviation from = 256 is .01; again, we would 

rej Tfm'ing h now h t e o S the setting of confidence limits, the student will recall 
that when ascertaining the .95 confidence interval for a population mean 
the procedure was to find f for the -05 significance level for the giveri# 
Call this t K then the limits are given by X ± t. 05 % The t. ?5 01 corns 

th e t that cuts off .025 at each end of the t distribution. This su |g ests th ^ 
when setting the 95 confidence interval for a 2 we would need to find the % 
falue thTcuts off the top .025 area and the y 2 that cuts off the bottom 
025 for the given df. Or if we had, for the df, the values for y that cut o 
the top and the bottom .01 area, we would have values to use in 
98 confidence limits. For the latter limits, our rule-of-thumb procedure 
wUl be to find, for the given df, V, as the y 2 that cuts off the lower .01 
area and y 2 01 as the y 2 that cuts off the upper .01 area. 

Returning now to the example with S 2 = 100 (or v 2 = 05) and df- 20 
we find from Table D that y% 9 is 8.26 ;and y 2 ,* is 37.57. ask 

what values of u 2 will yield these two y 2 s. Using 8.26, we have 


X 


,2 _ 


8.26 


( 219 ( 100 ) 

o 2 


(21 - 1)( 105) _ 2100 
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as an equation to solve for <r 2 . Thus, <r 2 = 2100/8.26 = 254 24 Using 
we have * 2 = 37.57 = 2100/n 2 from which we get ff 2 = 55 89 Note 

■/ lead Tthf i" P01 r l ka r S t °, the Upper limit ’ Whereas the hi S her 
/ , 01 leads to the lower limit, for u 2 . If we take the square roots of the 

wo or s, we get 7.47 and 15.94 as the limits for the .98 confidence interval 
for the population standard deviation. 

differences between variations 

Formulas (6.10) and (6.11) f or the standard error of the difference 
between standard deviations, for A not small, have been given earlier in 

is oo . When testing the difference between two standard deviations or 

a t S a ! ways ’. distinguish between situations involving 
correlated values and situations in which the measures are independent 

(or based on independent samples). The methods about to be presented 
are applicable for both small and large samples and are based on differences 
between variances rather than differences between standard devotions 
ifferences between correlated variances. Correlated variabilities arise 
w en we have two forms of a psychological test administered to the same 
group with an 5 or s for each form, or when we have the 5 for a first trial 
. the 5 for a later trial for the same sample, or 5s for the performance of 
one group under different experimental conditions, or 5s baseTon two 
groups (N pairs of individuals) related by blood or related by match ng 
For such situations the difference between variations can be test d by 


_ (s 2 i ~ s\)jN - _ 

V , 4s 2 1 s 2 2 (1 - r\ 2 ) (14 ’ 2) 

or its exact equivalent with ^ and A, replaced by 5 2 and 5 2 This r 
follows the t distribution with N - 2 degrees of freedom. 

Differences between independent variances. For the purpose of test¬ 
ing the difference between uncorrelated 5s or is. Professor R. A Fisher 

developed the mathematics of the sampling distribution of a function 
designated by z and defined as iunction 

2 = log, i, — log,. i 2 (143) 

If successive samples are drawn from a single universe or from two 
universes having the same variance, the sampling variation of z will center 
at zero and depend on n r and „„ the two dfs Note that the sampfing 
distribution is independent of the universe value of the variance of 
standard deviation. In other words, we do not require an estimate of a 

thf n ste r 4 er ,T r WhlC . h U uses infor mation from the samples, as required for 
the standard error of the difference between 5s. Probability table! for the z 
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function are available by which we can, for given dfs, i.e., * and n a , find 
how large z must be for the .05, the .01, and the .001 levels of significance 
The z, defined by formula (14.3), has one disadvantage: logarithms must 
be used' Since (14.3) can be written in the equivalent form 

(14.4) 


z = - log. 


S 2 
S 1 


it is seen that, instead of the difference between two logarithms, we have z 
as a function of the ratio of the two estimated variances From the 
sampling distribution of one-half the log of a ratio, the sampling distribu¬ 
tion of the ratio itself can be inferred. For * = 5 and * - 16, the va ue 
of 2 which will be exceeded 1 per cent of the time by chance (the .01 
probability level), is .7450. This is one-half the log of the ratio of the 
two variances, and hence the log of the ratio would be 1.4900; by refer¬ 
ence to a table of natural logarithms the antilog of 1.4900 is found to be 
4 44 That is as large a ratio as 4.44 would occur .01 time by chance, n 
order to avoid the necessity of using logs, Professor George W. Snedecor 
has developed tables for the variance ratio , which is defined as 

F = f!i (14.5) 


The equation* of the sampling distribution of F contains two ns:, «i for 
the df upon which s 1 is based, and n 2 as the df for j 2 . This means that there 
is a sampling distribution curve of F for each possible combination of * 
and n„. The probability table for F must accordingly be entered with * 
and n, in order t'o learn what level of significance a given F reaches. To use 
Table F of the Appendix, we take the larger of the two variance estimates 
as the numerator in computing F, and the df for this larger estimate is 
symbolized as* regardless of any system of subscripts that may have been 
used to designate the two groups. Thus the F that is used with the tab 
is always unity or greater, even though the sampling distribution of F 
involves values less than unity. That is, if we were drawing successive 
samples from groups A and B and each time took F as 5 Js „ regardless o 
which was the larger estimate, the sampling distribution of F would 
obviously involve values below unity as well as above unity. The table, 
however, is set up in terms of the greater-than-umty side of the sampling 
distribution. 


y = 


Hrr 


! 1 / 2 rt 2 K 2/ 2 


F (? T ~ 2) / 2 


-lr 
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If we wish to judge whether two samples, either large or small, yield a 
difference in variability which is large enough to warrant concluding that 
the two population variabilities differ, we set up the null hypothesis that no 
difference exists in the two population variances. Then, instead of dealing 
as usual with the difference between the two estimates, we take their ratio® 

ShT y ’ th ® d f partur f of this rati0 > from unity reflects or depends on 
e difference between the two variance estimates. If the value of F com- 

reir r ! he J' a ,- ger estlmate in the numerator, is so large that it is not 
reasonable to believe it a chance deviation from a true value of unity the 

null hypothesis is rejected, and it is concluded that the two populations 

do not have the same variance. If F is small, i.e„ near unity, the null 
hypothesis is accepted. 3 

Now it happens that, although the lvalues given in Table F for the .05 
te .01, and the .001 levels of significance hold for the major and very 
ex ensive uses of the F table to be discussed in Chapters 15-18 these 
values are not applicable to the simple case where we wish to ascertain the 
probability of as great a difference (irrespective of direction, i.e., a hypo¬ 
thesis or decision requiring a two-tailed test) between the variances for two 
groups. For this particular case, an F which falls at, say, the 01 level 
signifies that as large a difference in one direction would occur 1 per cent of 
the time by chance. This is so because in placing the larger estimate in the 
numerator we are considering only one tail of the F distribution. In 

u • tW °, Va , rknCe eStimateS ° f ’ 10 and 25 based two 

fno rOff dlffe Y' e '’ ! ead , t0 an F Which departs significantly from unity 
(no difference) we should consider not only the probability of securing an 

fo/25 Th 3S 'f^ ,i K Ut f ° Pr ° babmty ° f obtainin S one as smafl as 
10/25. This, it will be observed, is exactly analogous to considering both 

positive and negative values for the z of formula (14.3) and then raising 

the question as to the probability of obtaining on a chance basis as large a 

difference irrespective of direction. If we had this last probability we 

would halve it to obtain the P for one direction only; conversely, if we 

had an F which fell at the P = .01 level in the table, we would need to 

double .01 to secure the probability for as large a difference irrespective of 

directaon. In other words, for this particular case, that of testing the 

gmficance between the variability for two groups, an F at the .01 point 

of the table means significance at the .02 level; an Fat the .05 level means 

aUhfoo^lf w l6V n 1; “u “ F ^ the 001 level indicates significance 
a '°° , l6 y® L We Wl11 not have t0 mak e this type of adjustment when 

variance ^ prmClpal US6S of F in connection with the analysis of 

J°; h fT p1 :’ SUpP ° Se , that , 50 - 21 a « d 147.62 are variance estimates 
ailable for two samples of eight and nine cases respectively. The 
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respective dfs would be 7 and 8. In computing F wehave 
= 2 94 and ^ becomes 8, with n 2 = 7. Turning to Table F, we see hat F 
would need to be 3.73 for the .05 level which for this type of probtemis 
the .10 level. Therefore the null hypothesis ,s not rejected. If we take the 
square roots of the two variance estimates, we get ss of 7.09 and 12.15. y 
the F test we are in effect saying that the difference between these two ss 
is not significant. As usual, this does not prove the null hypothesis it 
becomes acceptable because we cannot with sufficient certainty reject ti. 

If the research hypothesis being tested or the decision to be made call 
for aone-tailed testae F values in Table F are applicable without further 
ado As a matter of fact, if the null hypothesis is to be accepted unless 
^ is significantly larger than 4, we would not bother to compute F if s „ 
turned out to be smaller than s 2 „. 

Differences between several independent variances. We have seen in 
Chapter 13 that X 2 can be used to provide an over-all test of the difference 
between several independent proportions (p. 230) for C groups and also 
between C correlated proportions (p. 227). In the next chapter we shal 
see how an over-all test can be made for the differences between .several 
means, either correlated or independent. We shall consider now an ove - 
all test of the difference between three or more variance estimates. This 
test is not applicable when the variances are correlated (based on the 

same group or matched groups). 0 2 2 based on 

S “W“' ” h "' * f *Jr.es ittadom 

Let N be”the sum of the ms. Compute the products: each i 2 times its df 
Sum these k products (the equivalent of summing the k sumsjrf^squares .of 
deviations). Let i 2 „ stand for this sum divided by N - k. Determine the 
log of each of the k ** values, then calculate the products: each og 5 times 
the df for the given s i . Sum these products, that is, ?( m i ) og 5 , 
which i takes on values from 1 to k. Determine the log of ^ 2 ,„, and compute 


1 A _j_M 

_ 1 + 3(fe - 1) h nij - 1 N - h 


3(fe - 1) 

Finally, calculate the quantity 

_ 2.3026 log s 2 m _ £ (m f — 1) log s 2 ] (14.6) 

c * 

The sampling distribution of V follows the X 2 distribution with k - 1 
degrees oTfreedom. If V reaches the P = .05 or P = 01 or any a prion 
chosen level of significance, the differences between the k variances may be 
regarded as nonchance, hence the conclusion that the k groups have not 
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been drawn from populations having equal variances. If Fis not signifi- 

ZmL*:?' , he h,p ° ,i,e “ ■ hM f”« 

geneous Th “ S T var “- The Variances are said to be homo- 
L“ ™ e P rocedure just described is known as Bartlett’s test for the 

homofcedalt- 0 ! VananCes : 1113 appropriate for testing the assumption of 
homoscedasticity m bivariate correlation scattergrams. 

F ’ X 2 ? *, AND s (NORMAL DEVIATE) 

Since F invokes the ratio of two variance estimates and since there is a 
connection between a variance and x \ it is possible to write Fas a function 
of two /s. Recall that (At - W = r* with dfolN- 1. Solving for 
^ we have ^ = oY/(* _ ,). Thus ^ = < 2 and = J ° 

which and n 2 are the dfs. Then * 2 “ 


F = i = uVi/'b 
*** °\x\ln 2 

Under the null hypothesis condition that a 2 -, = 


c 2 2 we see that 


or that F is the ratio of two / variables, each divided by its df. This 

TheT? f m ° re , fUndamental definition of the F ratio and also serves as 
rnnd t ^ * he denvatlon of the sampling distribution of Funder the null 

Tf F m - tUr fi ka f !° the ValUeS ° f jFf ° r Ievds of; significance 

of F is not ^ lar S e > we slm P*y say that the sample value 

of F is not consistent with the null hypothesis, or that ^ exceeds A. 

If the estimate .v 2 2 is based on an infinitely large df( i.e. n, = infinitvl 
it will equal a\ or if o* 2 happens to be a known theoretical variance (e g 
the variance of a binomial), we can write F as V ° ’ 


J2 


°iXi/ni 


is iTconstant^h r6pIaCe the denom i n ator by a equivalent because it 
s a constant hence can not possibly vary as a ** variable. Again, under 

the null condition that ^ ^ the two variances cancel leaUng F 

- * A when n 2 = oo. This means that in the oo lines of Table F each 

divide by its df and enter the quotient in the oo lines of Table F to learn 
whether it reached one of three levels of significance. 

X = ^ 866 ‘ hat F = *"• B ut a with 1 df equals 

/ , r a z where 2 is a unit normal deviate; hence F = z 2 when 
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n = ao and = 1. The first column entries of the co lines of Table F 
are the squares of x/a values. 

If n x = 1 and n 2 varies, we have 


F = 




in which we deliberately did not replace s 2 2 by its % 2 equivalent. The 
numerator involves a y 2 with 1 df , hence y\ = x\j<y\ in which x 1 is a 
normal deviate with variance <F V Substituting for y\, we have 

f _ ^i^ 2 ! I° 2 i __ ii (14.10) 

S 2 2 S 2 2 

If a 2 1 _ a 2 2 = we ma y regard s 2 2 as an estimate of the common vari¬ 
ance, a 2 . Now since the x 2 values are random normal deviates from the 
assumed common population, we may drop the subscripts and have 
F = x 2 /s 2 with n x = 1 and n 2 = n where n is the df for s 2 . The square root 
of F becomes x/s, or the ratio obtained by dividing a normally distributed 
variate by an unbiased estimate of its standard deviation. Since this 
corresponds to one definition of t, we have F — t 2 when /q = 1 and n 2 = n. 
All the entries in the first column of Table F, with the exception of oo lines, 
are t 2 values. The exceptions are 2 2 , or (t/u) 2 , values of the unit ,normal 
distribution. 



Chapter 15 

ANALYSIS OF VARIANCE: 

SIMPLE 


The F or variance ratio defined in Chapter 14 is applicable in a wide 
variety of situations. The general requirement is that we have two 
independent estimates of variance, which estimates are, on the basis of the 
null hypothesis, regarded as estimates of the same population value. If F 
is sufficiently large, the null hypothesis becomes suspect, and we draw a 
positive conclusion, the nature of which depends on the given situation. 
Each application in this and the following chapter requires an assumption 
of normality and an assumption of homogeneity of certain variances; 
normality of what, and homogeneity of which variances, will need to be 
specified for each type of situation. 

Although these assumptions are incorporated in the mathematical 
derivation of the F distribution, there is ample evidence that marked 
skewness, departures from normal kurtosis, and extreme differences in 
variance (of the order 1 to 4 to 9—it is not the numerical differences but the 
relative sizes of the variances that are pertinent) do not greatly disrupt the 
F test as a basis for judging significance in the analysis of variance. The 
error introduced by violations of assumptions is such that Fs may reach 
the .05 level about .06 or .07 of the time, and the .01 level about .02 of the 
time.* If the investigator wishes to have some assurance that he is not 
risking the making of the type I error more often than his chosen level for 
judging significance, he may wish to adopt a somewhat more rigorous 
level: requiring a computed F to reach the .01 level provides a very safe 
base for claiming significance at the .02 level. 

* See the study by Norton, reported in Lindquist, E. F., Design and analysis of 
experiments, New York: Houghton Mifflin, 1953. 
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It will be recalled that under certain circumstances the squared correla¬ 
tion coefficient is interpretable in terms of the proportion of variance 
“explained.” The idea is that variation can be broken down into com¬ 
ponent parts in such a way as to permit specification of the relative 
importance of the component sources. Back of this is the fact that vari¬ 
ances are additive to a total variance, as shown when we derived formula 
(9.10), which is basic to the so-called variance theorem. Although this 
theorem is fundamental to the analysis of variance technique, it is not our 
aim to consider methods of estimating the proportion or percentage of 
variance due to a given source but rather to discuss ways of testing whether 
a possible source is contributing to the total variance to a statistically 
significant degree. 

BREAKDOWN OF SUM OF SQUARES 

Let us begin with the simple situation in which the total variation for a 
set of scores based on N individuals is possibly due in part to the fact that 
the total group is heterogeneous with respect to some factor, such as 
socioeconomic level or age or racial origin or type of treatment or method 
used in memorizing or varying level of illumination—any factor whic 
permits breaking down the total group into subgroups. In other words, the 
individuals or their scores can be classified into subgroups, or the total 
group can be regarded as made up of specified subgroups. For simplicity, 
let us assume that the subgroups are of the same size, say m cases per 
group, and that we have G groups. Let g stand for any subgroup; i.e., 
t takes on values of 1,2, 3, • • ■ , G, and let themean score for the groups be 
specified as X x , X 2 , • • • , X ff , • * * , X G , with X as the mean for all groups 
combined (total mean). Although it is possible to use a precise notation 
such as X igi to denote the score of any, the ith, person in groups we shall 
in this chapter simply use X as the score for any individual. 

We are now in a position to write an individual’s score as a deviation 
from the total mean in terms of the deviation of his score from his group 
mean and the deviation of the group mean from the total mean. Thus, 
for a score in group g, 

(X — X) = (X — Xg) + (Xg — X) (15.1) 

which indicates two sources of variation: the variation of a group mean 
from the total mean and the variation of an individual’s scoie from his 

group mean. 

If we rewrite formula (15.1) specifically for group 1, we have 
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Squaring both sides gives 

(x-Xy = (x- x,f + (X, - Xf + 2 (x, - X)(X - X,) 

as the squared deviation, from the total mean, of any score in group 1. 
Each of the m persons in the group will have such a squared deviation 
score. We may indicate the sum of the squares for the m cases as 

2(X - Xf = 2(X - X x f + 2( X 1 - Xf + 2( X 1 - X)Z,(X - Xj) 

Note that in the last term the constants 2 and (X, - X) have been taken 
from under the summation sign, and that X(X - Xj, being the sum of 
deviations of a set of scores about their own mean, will be exactly zero. 
Therefore, the last term vanishes. Note also that the second right-hand 
term involves summing a constant, which is the same as multiplying it by 
the number of cases involved in the summation, i.e, 

2(V - Xf = m(X 1 - Xf. 

Thus we see that we may write the sum of squares (of deviations) for 
the first group and by analogy for the other groups as follows: 

1st group: 2(2T - Xf = 2(X - X,f + m(X, - Xf 

2nd group: 2(X - Xf = 2(X - X 2 )^ + m (X 2 - Xf 

gth group: Z(X - Xf = 2(X - X,f + m(X g - Xf 

Gth group: X(X - Xf = 2(X - X a f + m(X 0 - Xf 

If we summed the left-hand parts of the foregoing, we would obviously 
Have the sum of squares of deviations for the entire set of N = mG cases. 
This summing of sums, or double summation, can be conveniently 
indicated by using two summation signs, or 22(X - Xf. We may sum 
the right-hand terms separately. The first term on the right involves 
summing sums, and the result can be indicated symbolically by 
^ 2 (df — X s f, which implies that we first sum for each group, then sum 

over all groups. The first summation sign indicates that the subscript s 
takes in turn values running from 1 to G. The sum of the other right-hand 
terms can be written as m 2 ( X g — X) 2 . 

Since adding of equations leads to an equation, we have 

££(X - X ) 2 = S £ (X - X g ) 2 + m S (X g - X ) 2 (15.2) 

as a means of expressing the fact that the total sum of squares (of devia¬ 
tions) can be broken down into two components, the first of which has to 
do with variation about group means, i.e., within groups, and the second 
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of which involves variation of group means about the total mean, i.e. 
between groups. In other words, the total sum of squares is made up of 
two additive parts. If we divide both sides by N or mG, we have the total 
variance broken into additive components, but for our present put poses 
we shall need unbiased estimates of variance, and hence it becomes 

necessary to divide through by degrees of freedom. 

The correct rf/can be ascertained by examining the three sums of squares. 
For the total sum of squares we have one restriction, the total mean, and as 
seen in Chapter 7 the d/will be N - 1 or mG - 1. The withm-groups sum 
is based on N or mG squares, but since these are about G different means 
there are G restrictions, or mG - G (= N - G) degrees of freedom. The 
last or between-groups sum involves G means, varying more or ess a ou 
the total mean; thus, aside from the m factor, it contains G squares wi 
one restriction, and the df becomes G - 1. Tn other words the G means 
are analogous to varying scores, and obviously the mean of these meai 
will equal the total mean. 

Wemay indicate the division of the three sums of squares by the proper 
dfs as follows: _ _ 

SS(X - Xrf - V > 2 


££(X - xy 

mG — 1 


mG 


G - 1 


Notice that we are no longer dealing with an equation. Why? Each 
division will result in a variance estimate, but these are not directly 
additive, which means that we cannot specify what proportion of t e 
estimated total variance is due to the between-groups variation. The 
reader should note, however, that the dfs are additive: 

(mG - 1) = (mG - G) + (G - 1) 

Before examining the meaning of these three variance estimates, let us 
label them- i 2 for the estimate of total variance, s 2 w for that based on the 
withm-groups sum of squares, x 2 , for that based upon between, groups. 
Variance estimates are sometimes referred to as^mean squares. 

It is of interest to note that , 2 „ = can be written as 


£(X 


XJ 


m 


1 


which indicates explicitly that s 2 „ may be regarded as the average of G 
estimates of the within-groups variance. 
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meaning of variance estimates 

samples from G tillLl composite 

nse if the population variances are equal: if the G groum u 

/'™U fr dffj BWefr^b^ 11 ’ thiS Within -8 rou P s fiance estimated 

anat trt ?s 

estates are based on practically the same 

either direction, will tend to make both j* and s 2 large If m or the 

ut u, ne« look „ The of „ J. _ n , iu to 

accomplished b, dividing the sum factor by G - 1 I„ m ,H« , hi , 

fh“l,“ J'ltf" 8 * ” f 1 offedom hem. 

» mam ^ * ““ Ut “ »“ ^ •> » ^bo, for foi. 


A = 


mZ(X rj - Xf 


= ms 2 . 


In order to understand the meaning of s 2 . we mav remr-d ™,r r 
a sample of sample means from a S „ iad^™ « 

sample means for groups drawn from the same populat ^ Th e var 

tm h u; s a“ °-7t r ns is given by the s “-- “ 

d™ ’ e /’ a h~ a l m ■ If we were gwen the value of u 2 _ and told to 

daemon, the varianci , „ we ™ “ 

l& jjr f ” f ,?“■ ™V I'* W only an eshmafe o f 2" 

v.ri.„c.?C^he , tr“ 

wSwo^ii' <" »“ i “ »> —«• 

These estimates should agree within the limits of chance and beimr 
independent estimates of the same variance the sampling distribution of 
hen- ratio ,s that of the F distribution. When an obtained for ?/" 2 
s larger than expected on the basis of chance sampling, the implication 
is that, , is greater than expected on the basis of chance sampling, heTce 
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that there are real differences among the G means. If the null hypothesis of 
no difference is true, we expect that in the long run s\ will tend to equal cr 2 ; 
that is, the average or expected value of s 2 b is o 2 . Suppose that the null 
hypothesis is not true and we ask about the mean or expected value of s 2 b . 
An expected value is defined as the mean of an infinite number of obtained 
values. 

Now 

2 mL(X g - Xf 

s h — - - 


in which X g is a sample mean from the gth group. It seems reasonable to 
say that the variation among X l9 X 2 , * * • , X g , • • • , X G will have two 
sources under nonnull conditions: the extent of the variation among 
the corresponding G population means, fa, fa, • * * , fa, • * * , R G , plus a 
random sampling component because each X g is based on a sample of 
m cases. If we symbolized the sampling error by E g , we would have 
E g = X g - fa, which leads to X g = fa + E g . (This is similar to the 
conception of a score X as being composed of a true part X t and an error 
E so that X = X t X E.) 

We seek an expression for the deviation (X g — X) which will incorporate 
the two sources of variation. The random error part can be expressed as 
(X g — fa) and the possible differences among the population means as 
(fa — fj) in which }i is the mean of the population means. Thus we could 
write __ _ 

(V - X) = (ft - fi) + (X, - ft) -(X- ft) (15.3) 

in which the third term on the right is a (nuisance) variable which must 
be incorporated to make the equation balance, i.e., to provide an identity. 
To obtain an expression for s 2 b (aside from the m factor) we could square 
and sum either side of the foregoing identity. If we did this we would 
discover that the (X — p) term is really a nuisance. There is a trick that 
will help, but before considering the squaring and summing further, let us 
set up a scheme which will eventually aid in ascertaining the mean 
(expected) value of s 2 b . Let us imagine that we have R replications or sets 
of data, each set leading to G means based on m cases. We might arrange 
the obtained means in a table (Table 15.1) consisting of R rows and G 
columns, each mean having two subscripts, the first of which designates 
the row, the second the column. Thus X m would be the mean in the third, 
row (set or replication) and the sixth column (group). Note the notation 
for the mean of means (M of Ms) along the bottom and right margin. 
The M of Ms on the right are obtained by summing across columns 
(groups)—the dot replaces the second subscript. For the M of Ms at the 
bottom the summing is over rows (replications), hence the dot replaces the 
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Table 15.1. Means for G groups, with R replications 


1 2 


1 

Wi 

x 12 

2 

*21 

Xv 

r 

Xn 

X r2 

R 

X«L 

*R2 

M of Ms 

X. x 

x . 2 

Pop. Ms 

!'-i 

M-2 



G 

MofMs 

*1, 

X_ ia 

*i- 

*2, 

X-zo 

*2- 

y 

^rg 

XrO 

x r . 

X Rg 

Xrg 

Xr- 

X-g 

X-G 

X.. 

V-g 

!*-G 



first subscript. Averaging the M of Ms along the right or those at the 
bottom leads to the X.. at the lower-right corner. The jus at the bottom 
hold for the G populations. 

To rewrite (15.3) for our first set (or row) we need only insert the 
subscript 1 to designate the first row, along with appropriate dots as part 
of the subscript notation. Thus, 

{X 1(i — Ap) = (fi. g — /c.) + (X lg — ju. g ) — (X v — ju..) (15.4) 

or more generally, we have 

(X rg - X r ) = (fi. g - p..) + (x rg - ^ g ) - (X r . - fi..) (15.5) 

as a sort of model for expressing the deviation of the gth mean in the rth 
replication X rg from the over-all mean of the rth replication X r ., given on 
the left of the equality sign, in terms of the three components on the right 
side. (Note that writing the mean of the gt h population as ju. g indicates 
that its value is independent of the first subscript.) 

When we square both sides of (15.4) and sum over groups we will obtain 
for the right side three terms involving squares and three involving cross- 
products. One of these cross-product terms, —22 (X lg - pL. g )(X v — ^..), 

turns out to be unmanageable. We can avoid this difficult term by shifting 
the nuisance component, (X v - /c.), to the left, rewriting (15.4) as 

(Xi g - X v ) + (X v - n..) = - fi..) + (X lg - fj,. g ) 

Squaring both sides and summing over g, we have 

- X ± .) 2 + G(A X . - f i ,.) 2 + 2(X x . - fA..)^(X lg - X ± .) 

= T i (fl. g — /L.) 2 + 2( X lg — fA.g) 2 + 2li(fA. g — [A.){X lg — fA. g ) 

Note that since the second term on the left side involves summing a 
constant G times, the summation sign was replaced by G; note also that 
in the next term, a constant is taken from under the summation sign. Note 
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further that in this same third term, the expression X (X lg — X v ) is zero, 
hence this cross-product term vanishes. 

Omitting the vanishing term and shifting the second term on the left to 
the right-hand side, we may write the equation for the first set, and by 
analogy for any set (the rth) and then the Rth (for a total of R replications). 

S(r i9 - Xj.) 2 = SO*., - ft:) 2 + 2(X l0 - !X.„? 

- G(Xl. - f 1 -o) 

S(X„ - X r f = s 0*., - gf + S(X r9 - II.,) 2 

g 9 & _ 

- (i( X T . /i. . )” + it..)( X nl 11 ., ) 

X(X Rg - X B .f = SO-, - H- f + - fl.gf 

g ( J g _ 

— G(2l"jj. — /L-) 2 + 2S(/I. sf Ra M'-g) 

The addition of these equations will lead to a new equation for which 
we will need double summation signs, summing over g from 1 to G and 
over r from 1 to R. Thus, 

XX (X r g ~ K-f = R^-g - R-f + - R.gf 

- GX(A r -R..f 4- 2XX(^. 9 — fi..)(X rg — fi.g) 


r g 


r g 


To facilitate further consideration, the five terms have beenjiesignated 
by letters. It will be recalled that when we divide m X {X g - 2Q 2 by its df, 
Q — \ 9 W e get s 2 b which may be thought of as ms\. When we divide the 
exact equivalent, m X (X rg - ^-) 2 , by G - 1 we get an s\ for the rth set, 

and this s 2 b might be written as ms 2 ^. But note that in the foregoing R 
equations we did not (and need not) have m, hence the division of 
X (X rg - X r .f by G - 1 gives s\ r) . 

3 Our plan of attack is first to note that when each of the ft X parts of 

term A is divided by G - 1, we have a variance estimate, s\ r) , for each of 
the R replications. If we next sum these (varying) estimates and divide by 
R we will have the mean of the ft estimates. Then if we think [: of R as 
approaching infinity or as infinitely large, we will have thejneanJor 
expected) value of the estimate. This process of first dividing X (X rg - X r ) 2 
by q _ \ 9 then summing over r, followed by a division by R can be stated 
as follows. X(A X ) 2 

£ Z.(X r ,-X r .f - 

~ X " 


y ? 2_ 

^ Xbir) 


R(G - 1) 


R 


R 
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By definition, as R becomes infinitely large we have the expected value of 
the variance estimate, But we wish to express this in terms of 

two parts—one which reflects a random error in the observed means 
whereas the other reflects possible real differences among the G means, 
To do this we wil1 need t0 work with the right-hand side of the above 
ve-term equation in order to see what happens when the four right-hand 
terms are also divided by R(G - 1) and R is allowed to approach infinity 
Obviously, such a division will maintain the equation. When dividing 
it is necessary that we break up R(G - 1) so as to evaluate a given term as 
R becomes infinitely large. 

For the B term, we have 

R(G - 1) - G-l 

which is a fixed quantity regardless of R. 

For the C term, we may write 

~ ^ _ v [? ~ ^v 2 ]/ R 

R(G-l) „ G-l 

This involves for any one of the G, say the gth group, the squared devi¬ 
ations of a series of sample means, each based on m cases, about the 
population mean for the group. For R infinitely large, the sum of these 
squared deviations divided by R is the true (or theoretical population) 
variance of sample means, hence C becomes 

? f _ G <r 2 

G 1 G — 1 G — 1 m 

in which the sample variance of the means from the gth group, a 2 -, 
has been replaced by the familiar formula for the sampling variance of 
means m terms of population score variance, c 2 ff , and sample size, m. 

It may help the student to note that we are dealing with the distribution 
of sample means in any column, the gth, in Table 15.1. The last step in 
t e foregoing procedure involves the assumption of homogeneity of 
variances. When ^ the summing over g of the G 

variances, all equal, is nothing more than G times cr 2 , the common popu¬ 
lation variance. r 

For the D term, we have (ignoring its negative sign which is to be picked 
up later) r 

OS ( J r , — ^.,) 2 G 2 ( X r . — ^ w ..) 2 


R(G - 1) 


G - 1 


R 
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which, as R becomes infinitely large involves th ® 

of the means of means along the right-hand margin ’ ns is 

variance we will symbolize as Since each . , 

based on mG cases, we could easily jump to the mistaken conclusion that 
this variance of sample means is a function of the variance of the 
scores about and would therefore depend on the variance within the G 
groups plus possible variation among the G groups^ Note that although 
ft is presumed that the m cases- of group g have been randomly drawn 
from population g, it does not follow that the mG cases have been drawn 
randomly from a grand total population made up by combining the G 
subpopulations. Instead, the sampling process ensures that each group is 
equally represented by m cases, which would not necessarily be true 1 /” J 
cafes were randomly drawn from the grand total population without the 
provislof of equal representation. (This involves the concept of stratified 
sampling, to be discussed briefly inChapter 20.) 

To evaluate the variance of the X,. about ft.., let us note that 

= (Ki + X A + ■ ■ • + X„ + • • • + V G )/G 

which indicates that X as a random variable is made_up by addtng G 
variables of the form X rg , the means in_the rth row of Table 15.1. W 
R infinitely large the T rl , T r2 , X Tg , X rG will be independent random 
variables (zero correlation between all possible pairs o co um ” s * 
Table 15 1), hence the variance of the sum in the preceding paren es 
Sf V given by the sum of the variances for the random vanab es being 
summeJ, and to get the variance of the averages, X r . we need only divi e 
by G 2 . Thus 

2 — (a 2 + (T 2 - + • * • + + • * * + a2 X r( )lG 2 

G air) ~~ ' Xrrj 

but each of the G variances is the sampling variance for the means for a 
particular column in Table 15.1, which for infinite R will be nothing more 
than a 2 jm, with <r* being the (assumed) common variance fop ' 
populations. When we sum G of these, we will have G«7>, so that 


2 

G x(r) 


Gcrjm 

G 2 


a 

mG 


Hence the value for the D term becomes 

/--i 2 

G ~ G 


2 _ 

G x(r) 


Q _ 1 G — 1 mG m(G — 1) 

As for the last, or E, term of our five terms, i.e., 

22 2 n..)(X n -As.) 
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let us note that the contribution of group 1 to this double sum would be 

2 S — fi.-t) 

which, since O.j - /*..) is a constant, may be written as 

2(//.i — ft") S (X rl — ( w. 1 ) 

Now with /? infinitely large, the mean of the X rl will equal hence 
we have the sum of a set of deviations about their own mean. It will be 
recalled that such a sum is always zero, a fact which will likewlsIhoW 
any and all values of g; therefore, the E term, when divided by infinite 
vamshes-we need not divide by G - 1 of the R(G - 1) divisor. 

. , e zfu W nn ^ t0 S etller tlle results of dividing the five terms by R(G — 1) 
with R becoming infinitely large. J V h 

%b(r) A*--) Q q 2 2 

+---+ 0 


2 s 2 - 


R G - I 

Multiplying both sides by m gives 


G — 1 m m(G — 1) 


m 


Xs 2 


x b(r) 


R 


_ « -iQ 2 + gof 


o 

G 


G - 1 


G - 1 G - 1 


m ° f the left ' hand side under ‘he summation sign, we note 
that the last two terms combined become u 2 , and we rearrange, giving 


ms‘ 


R 


x b(r) 0 m ^ (fl.g jX.^) 

- = (T 2 + “ 


G - 1 


len! hl !* t th l? lean - 0f an infinite nUmb6r ° f ' ms Xr, (° r of'‘he exact equiva- 

That is' thf Ue m 1S f Ven °" ! h ® ri 8 ht ' hand side of ‘he last equation, 
at is the right side gives the mean or expected value of s 2 Stated 

on th r e n r J;t We Tf ^ ^ ^ A “ “ eStimate indudes the tw ° components 
on the right. If we let -> stand for “is an estimate of,” then 


21 b 


U 2 + 


and also 


G - 1 


w v 

which helps us see that F = s\ls> w is a test for the presence of real differ¬ 
ences among the /i. g , or the G population means. 

7- diSt j nguish between two differing situations as regards the 
invo ving the jx. g values. If the G groups represent, say, G schools 
rawn at random from a possible population of schools, we have the 
so-called random model, whereas if the G groups are the two sex groups or 
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five defined socioeconomic groups or groups working under two or more 
experimental conditions, etc., we have th e fixed effects model. Note that 
for the latter model, the number of groups is typically small and fixed. 
Even though we were defining G experimental conditions as, for examp e, 

G differing degrees of illumination, we would not draw the G levels at 
random from the theoretically possible large number of levels. Instead, 
we would deliberately select G levels so they would be spaced along the 
illumination continuum. If we were interested in the effect of sense 
mouality on reaction time, the number of possible sense modalities which 
we could use is fixed. The fixed effects model is sometimes called the 
fixed constants model because the G values of fi.„ are constants—with 
no sampling of groups, exactly the same population means are involved 
for each replication. This would not be the case when replication of the 
experiment involved, for example, drawing another sample of schools. 

For this chapter we need not worry further about the models except 
to note that for the random model it makes sense to replace the term 
containing the random ^ by mo*,., because the sum of squares is being 
divided by (G — 1) degrees of freedom and hence is an unbiased estimate 
of the variance of the population means based on a sample of size G. The 
use of such a symbol when only a few (2 or 3, or more) definitely fixed ft., 
are involved does not provide a very meaningful description of the varia¬ 
tion among them. , . , 

The student who got lost in the foregoing rather tedious, though 
rigorous, method for determining the expected value of s\ might like a 
more intuitive and more easily understandable approach, which is restricted 
to the random model but has similar implication for the fixed effects model. 
Recall that when considering the reliability of measurement problem we 
regarded the variable score, X, as made up of two variable parts, X t and t 
so that X=X t + E, whence we had S 2 X = S% + S\. By analogy we may 
sav that X as a random variable is made up of two parts, and sampling 
error, E, (= X g - [i,\ Thus Y, = p, + E„ and if we had a sample mean 
X„ from each of the possible populations, the variance of the distribution 
of these means would be 


in which the last term is the sampling error component (each mean is based 
on a sample of m cases). Multiplying both sides by m gives 

ma% g = u 2 + ma\ 

Therefore, since m times the variance of the means, when all populations 
are represented, can be broken into two components, it follows that the 
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variation ( ~ ' S2i ’ ) ’ a ' S ° b ® Subject t0 the same two sources of 

In practice we do not have a priori knowledge as to whether the second 
component, expressed either as m 2 ( M . tJ - ^..)/(G - 1) or as md* is or 

is not zero What we have are two estimates of variance, .s 2 . and j 2 
( If5 i > 1S significantly larger than.? 2 ,,, i.e.,if.F = fW isbevond 

Ch “" “ >&****■ “ «■ x argued fliat * 

source of variation over and above that of random sampling errors in the 
observed means; hence the second component is real. 

use^tThf the tab ! e 0f i?rec l uires that th e larger of the two estimates be 
that 2 numerator in computing the variance ratio, it should be noted 

^inlin^T h S18mfiCant ’y larger than ^ unless theoperation of chance 
P S has been restricted m some manner. In practical applications 
we are primarily and nearly always interested in the case in which A is the 

larger of the two estimates. If it is smaller than .r 2 w , it is ordinarily not 
necessary to compute F. y not 

We may now summarize the foregoing. When we have scores on G 
g oupso, cases each, the total sum of squares can be broken down into 
two additive parts, that for between and that for within groups. Dividing 
by the appropriate degrees of freedom, the within sum of squares gives / 
as an estimate of the trait variance for the population, and A (= ms 3 ) 
yields a second and independent estimate of the same population variance 
The sampling variation of the ratio of these two estimates is that of the 
variance ratio, F, if the G groups belong to the same population. If A is 
significantly larger than ^ which is an estimate ofthepopulation variance, 

^ , must be regarded as an estimate of the same variance plus variation 
due to real, nonchance, differences between the G groups. Again, letting —■ 
s and for is estimate of” or “is the expected value of,” we have 

S 2 ,„ -> <7 2 


G - 1 


a 2 + mcrV 


(fixed model) 
(random model) 


The null hypothesis is that the second component in the expected value of 

dforTof r r . e J ect !°" of the hypothesis because s\/s\, as an Fwith 
dfox ni of G - 1 and df or of mG - G (or N - G), is significantly large 
implies that the second component is not zero. In other words, we have a 

Stweel! 18 1 f ° VldeS En ° Ver ' al1 t6St ° f the si § nifican oe of the differences 

between several means considered simultaneously. 

For all the applications discussed in this chapter, it is assumed (1) that 

the m cases constituting each group have been drawn from a normally 
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dil „ibu,=d population of ^ 

(2) that tl* of'lko.nL - 

first assumption can be cocked y ^ ^ ^ square {est of good - 

kurtosis relative to their standa J . sensitive for small 

ness of fit. Unfortunately neither of *^“1 d Lardl ss of sample 
samples. The second assumption may be ev " " nce f The reader will 

scores within groups, which lead to the labor ca „ be 

„„ - of 

squares of deviations inherent in formula (3.6). 

Thus we would have 

X£(X - Xf = ^ [NXSX 2 - (SSX) 2 ] ( 15 - 6 ) 

S S(X - x,) 2 = -i [mSSX 2 - S(SX) 2 ] (15-7) 


m 


for within sum of squares and that 

m£(X 


__ ^2 = _L [GS(SX ) 2 - (SSX) 2 ] 

mG 


(15.8) 


for between sum of squares. ares of deviations, we 

Accordingly, to compute t e ree ^ res of a n the raw scores, 
need to sum all the raw scores, SLX, 4 yry yt 2 These 

y v v“ and sum the squares of the separate group sums, £(SX) - These 

S,! o,n“!5 * oteinod on , o„on,n,in g 

<*«■ 

values for S(SX) 2 . 

OF D,FFER - 
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S5rSSsS=“ 

iE:aiTr^ds:ssS 

Table 15.2. Number of syllables correctly anticipated at the 34th minute of practice 

Rest interval (minutes) 

Number of trials 


1 

8 

5 

5 

5 

5 

8 

1 

2 

2 

2 

8 

4 

1 

3 


2 

3.5 


7 

4 

4 
7 

7 

5 

6 

8 

14 

8 

5 

5 

8 

5 


3 

4 

5 

2 

1.25 

0 

11 

14 

29 

9 

11 

17 

3 

12 

16 

9 

15 

18 

10 

11 

11 

5 

10 

15 

11 

8 

9 

9 

13 

18 

6 

13 

13 

7 

5 

12 

6 

7 

15 

16 

11 

8 

12 

12 

13 

11 

12 

7 

15 

9 

15 

13 

16 

15 

4 

7 

13 

16 

16 

16 

146 + 

172 + 

215 = 

,550 + 

1,982 + 3,059 = 


tfifSS :are given at the bottom 

SS’JiSing h ; g, '° UP meanS are alS0 S 

b y ■«**»*« in 


22(V - X) 2 = *[80(7638) - (692) 2 ] = J652 20 
2 2(X — X,) 2 = AC16 (7 638) — 110,778] = 714.38 

m ^(X a ~ X? = *[5(110,778) - (692) 2 ] = 937.82 


and the resulting 2SS es^ef ^ ^P^ ^ 

15.3, usually referred „ , variance "b.e'Xte fttnf' in T *“' 
for between ..d w„„l„ ^ a dd“ 
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Table 15.3. Variance table for data of Wright 
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Source 

Sum of Squares 

df 

Variance Estimate 

Between 

937.82 

4 

234.46 = s% 

Within 

714.38 

75 

9.53 = s w 

Total 

1652.20 

79 



a check on the arithmetic involved in substituting in the JWs 

does not check on the accuracy of the sums given m Table 15.2. Note 

that the degrees of t freedon l a ^ d ^ e 53 or 24.60. With dfi of 

The variance ratio, or becomes 0 / . , ^ 

= 4 and n = 75, we refer to the table of F to learn whether 24.60 

is 1 larger than expected on the basis of chance. That this F is high y 
signiffcant is immediately apparent when we note that for the given dfi an 
T of about 5 2 is significant at the .001 level. With the between-groups 
variance 3 estimate significantly larger than that for within groups, we can 
conclude wThigh g confidence that the five sets of scores have not been 
drawn from the stme population of scores, or that amount of time spent 
[n mactlce is a real source of variation. This is, of course, equivalent to 
saying that the several group means considered simultaneously di er 

groups C.U P 0 arranged in order Wore 

any of the data are seen, and additional credence can be P lac ® d *" *. e 
results because the means follow this ordering. It should be understood, 
however that the variance technique does not presuppose an a prion 
ordering of Ihe several groups-it is generally applicable for testing the 
significance of the differences between group means regardless of pno 

^IfonNlhe S , or t technique were available and we wished to compare 

the means for five groups, it would ordinarily be ^^tosTSo £ 
z for each possible difference, and five means would lead to 5 x 4/2. or 
differences” Obviously, the variance method requires less computatioi , 
J5XI it provides an over-all test of significance which is no 
subiect to the fallacy inherent in singling out the comparison involving the 
largest lained t or *, a practice which is likely to capitalize on chance 
differences. This problem is discussed at the end of this chapter. 

SPECIAL CASE OF F TEST WHEN n, = l 

If we had G = 2 groups, the testing of the between-groups variance 
would appear to be much like testing the difference between two means. 
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Let us examine this case by starting with the expressions for the sum of 
squares for two groups: 


1st group: S(jr 
2nd group: X(X 


X) 2 — X(X — X x f + m(X 1 — X) 2 
Xf = I.(X- X^ + m{X 2 - Xf 


Instead of using double summation signs, we may indicate the within- 
groups sum of squares as X(X - X,f + £(*- x 2 f, and the between- 

J)2 + m ^ ~ *)“• The respective dfo 
t , . atlC . ' Indlcatm g the division of the sums of squares by 

their dfo, we can write the variance ratio as ^ 


m(X 1 - Xf + m(X 2 - Xf 


F = 


1 


S(X - %)* + ^(X 
2m — 2 


x 2 y 


° f CaSeS f0r the tW0 § r0U P S is the same > * is readily seen 

m11 *77 01,6 Sr0UP be ex - tly as far - above the general mean 

( ) as the other group mean is below X 9 or that X will bisect the distance 
between X x and X 2 ; therefore (X x — Xf = (X 9 — Xf = Hx _ y \2 

The numerator for ^becomes (m/2)(if, - lt will notedthatlhe 
denominator term, which defines ^ is identical to the s- defined by (7 2) 
in connection wrth the ; test. Accordingly, we may write X ’ 


F = 




Dividing both numerator and denominator by m/2, we have 

- (El-12) 2 


F = 


the square root of which is 


2 2 
s 2 — 

m 


7? = X 1 ~3 = X 1 -X. 


m 


- + - 
m m 


which is identical with a formula for t, p. 103. When G = 2 or two groups 

l T 8 r Pare f d ’ then F = * 14 can be shown that this is also true 

thtt^he^ °I7 S’" th£ tW r Sr ° UPS 316 UnequaL In fact ’ il can be shown 
that when 1, the sampling distribution of ^becomes the same as that 

Xee of freedom “‘“T “ °" 8r ° UpS ’ i ‘ e - that based on 1 

gree of freedom, is used as the numerator regardless of which of the two 
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estimates is the larger. It is thus seen thatth^ 

F test. Note that F involves the square ° f * e between mea ns, 

hence it provides a basis for ju gmg w 248-49) The z technique 

irrespective of direction, is significant (cf. pp. 248 49). 1 ^ ^ 

for comparing the means of two arge samp ^ sma ll, the square 

" (T * b " A *• 

Appendix). 

GROUPS OF UNEQUAL SIZE 

the gth group would be written as __ 

2(x _ XT = S(X - X.f + mil, - Xf 

and the double summation over all groups would be^ _ ^ 

Z2(X - Xf = S S(X - X s ) 2 + S mlX „ - X) 

which differs to. formula 0 5.2) in « "he rf tote 

r.P™Vh”or„e V - i. » - O. and 0 - 1. The comp.— 

formulas are changed to 

I”ii2 for total sum 
N 


22(X - X) 2 = ssx ! 


(15.9) 


£X(X - X„) 2 


Sm„(X 9 - X)‘ 


22 X : 


(SX) a 

m„ 


s( SXf_(2SXf 

am, N 


for within sum (15.10) 
for between sum (15.11) 


Note that the second term for the withinsum (and the 

requires that for each group t e_squ^ summed. An additional 
divided by its m; then the sever q 15 2 for these quotients if 

row ™!J M ”“f “S|®' ro ° mrght be pieced by (SXfjm, 

AVinileblc El. Table ,5.3) may be '"^'LfZfySg. 
s btfom. The 

i,e„ if A is significantly Urg«*. ^ „ a , it of » in ph„g ; 

““ :ZZ e'SEtohi. Jis. between ,b« S ~p.- A,though for the 
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the use of unequa^mfdoTs^otnro 3 d'^ follow 016 F distribution 

test basee, *^,5"<“» «» . 

groups, it is preferable to assign m = NIC d ™ ded , lnto G experimental 

is a cost factor that is differential from 7** gr ° U P’ Unless there 

condition. m ex P e nmental to experimental 

difference between’tw ^more^mealt b & °t ^ signiScance of ‘he 
of equal or unequal size (per grounl ret> Tu ° n J arge ° r sma11 sam P Ies 
priori basis for arranging ^e S f Whether is an a 

thetically that the scientific hvMtheskV V*"'2* mi S ht be said paren- 

of differences if such are expJctld " S St WiU SpeCify the dir ® ction 

RATXO NG ™ c OF the correlation 

»L f 

same as the within-groups variance the a W1 1 ^‘ arra ^ s variance is the 
of intervals on another variable Also ^ ° n the basis 

same as between-groups variance We n , nance of arra y means is the 

».», us defined, LZZlZ Z uZl ' “» 

grouping he „,?Se taS It Z I'T?" * Th ' 

required sunt of squsres will be in terms of Y Th 6 vana ble> nnd the 

their respective degrees of freedom will be ’ SUmS ° f SqUares and 


& 

f ' WV-trj ' (G-l) 

definition fornmla'oPt^e'rorrelTuon rTtTo^eTmve^ 81 array ‘ Fr ° m the 


if = 1 


S; 

S 2 


nv 


Which becomes, in the notation of this chapter, 

?s ( r- Km 

SS(F _ Tf/N 


T=i- 
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Since N cancels, we see that the following holds: 

22(7 - Y g f = (1 — ?f)22(Y — Yf = within sum of squares 

g 


From the alternate expression for r\ we have 



which becomes 
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(15.12) 


S m s (Y„-Y?IN 
V S2(T - Y) 2 IN 

which leads to 

X m (F r - Yf = ?f22(Y - Yf = between sum of squares (15.13) 

g 

When we wish to divide the sum of squares of formula (15.12) or (15.13) 
by the proper df, we may choose either the left- or right-hand part as 
representing the sum of squares. Thus the between-arrays estimate may be 

written as __ 

2 ^ 2 22(Y - Yf 


and that for within arrays as 

2 _ (1 - *? a )S2(Y - Yf 
S w N~-~G~ 


The ratio, F = may be written as 

?7 2 22(Y - Yfl(G - 1) 

F "" } j 2 )22(F^--~ _ Y ) 2 /( N~—~G) 

__ rfKG — 1 ) 

- (1 _ ^ )/(iv „ G) 

It is accordingly seen that for fixed dfs the value of F, even though com¬ 
puted from the sums rather than from their equivalents in terms of rf, can 
be thought of as depending on the size of rf; therefore a significant F 
indicates a significant correlation ratio. 

With the three sums of squares computed, we can readily determine 
whether any correlation in the sense of the correlation ratio exists, and 
we also have the necessary sums for calculating rj if it is desired to have 
this measure of the degree of correlation. A significant F does not, how¬ 
ever, mean a high correlation ratio; with N large, a low v can possess 
statistical significance. 

The computation of the sums of squares is accomplished by means ot 
formulas (15.9-15.11) with the As replaced by Ys. 






272 


PSYCHOLOGICAL STATISTICS 


SIGNIFICANCE OF LINEAR CORRELATION 

An appreciable correlation between two variables which are linearly 
related implies that the slopes of the regression lines are not zero, which 
in turn implies that the variance of predicted values is large enough to 
have some kind of statistical significance. The variance technique may 
be used as a test of the significance of linear regression. 

Suppose that we develop the argument in terms of the regression of 7 
on X. We may write the linear equation for predicting 7 from X as Y’ 
~ BX+ ,A. If we think of this regression line as having been drawn on the 
scatter diagram, we can readily see that the deviation of any person’s Y 
value from the mean of the Ts can be expressed in terms of its deviation 
from the regression line (or predicted value) plus the deviation of the 
predicted value from the mean of the Ts: 

(T- F)= (7- Y') + (Y’~ 7) 

in which Y will vary from person to person in accordance with his X 

score. If we square all such (Y - Y) deviations and sum over all cases we 
get 

22(7- Yf 

= 2[(7- T) + (r - F)] 2 

= 2(7 - Y'f + 2(7' - Yf + 22(7 - 7')(7' - F) 

for which double summation signs are not needed for clarity even though 
the summing is over all cases. The last or cross-product term has to do 
with a possible relationship between predicted values and residuals, but as 
was shown in Chapter 9, this correlation is always zero, and hence this 
last term vanishes. 

Therefore the sum of squares can be broken down into two components : 
residuals or within arrays about the regression line and a part depending 
on the variation of the predicted values about the mean. If the correlation 
between Zand F were zero, this latter component would be zero because 
would be predicted for all cases. The departure of this sum of squares or 
of a variance estimate based thereon from zero might lead us to conclude 
t at real correlation exists in the population being sampled if it were not 
tor the fact that sampling errors ordinarily operate so as to prevent the 
obtaining of zero correlation. 

Before attempting to understand the operation of chance sampling we 
should consider the degrees of freedom associated with the sums of squares. 
As usual, the total sum of squares is based on N - 1 degrees of freedom. 
The dfiot 2(F - F') 2 may not be immediately obvious, but note that if 
Z = 2 and variation exists for both Z and F, the regression line would 
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necessarily pass through the two points defined by the pair of scores, r 
would be unity, and 2(7- Y'f would be zero. In other words with 
jq _ 2 there is no freedom for deviation from the regression line. From 
this it’would be inferred that N needs to be reduced by 2, or that df 
_ N _ 2, a deduction which is consistent with the fact that, in fitting a 
straight line, two constants are determined from the data, and hence two 
restrictions are imposed on the N deviations of the type (Y-Y) 

Since the dfi for the component sums of squares are additive tojhat tor 
the total, we can determine the df for the regression or 2(7' - 7) term 
by subtracting the df for residuals from that for the total: (N - 1) 

_ (n - 2) = 1 as the df for the regression term. But determination ot a 
dfby subtraction does not permit the additive check on the correctness of 
the dfi which is possible in case each df is ascertained separately on the 
basis of some principle. By what principle could we determine that for the 
regression sum of squares the proper df is 1 ? The value of 2( 7 ) 

will not be changed by shifting from gross_scores_to deviation scores l e 
by moving the origin to the intersection of X and 7. It will be recalled that 
the regression equation in deviation units is y' = bx (where b = B of the 
gross score form), and accordingly we may write 

2(y' — F) 2 = 2(i/' — yf = 2(i /' — 0) 2 = 2 (bx) 2 = 6 2 2x 2 

which permits us to examine the source or sources of variation in the 
regression sum of squares. Its value depends on h 2 and 2x 2 , but the value 
of 2x 2 does not depend on the degree of correlation. For a fixed set of 
7s the freedom of 2(7'- 7) 2 to vary springs from b, i.e., from one value; 
therefore the df is 1. A slightly different way of considering the question 
is to note that since b = r{SjS x ) and 2x 2 = NS»„ the sum of the squares 
of the predicted values can be written as 

2(7' - Yf = r 2 ^ (N S 2 X ) = N r 2 S\ 

b £C 

from which it can be argued that, since the variation in predicted values is a 
function neither of N nor of the variance of the trait being predicted, it is a 
function of one value, the degree of correlation. 

Now let us return to a brief consideration of sampling or of the meaning 
of the variance estimates which result from dividing the sums of squares 
by their dfs. On the basis of the null hypothesis, that the degree of linear 
correlation is zero for the population being sampled, the regression line 
for the population would pass through 7, with_zero slope or parallel 
to the x axis. Hence (7-7') will equal (7-7), and the variance of 
the residuals will equal the total variance of the 7s. A sample from 
the population will seldom yield zero correlation (or zero regression), and 
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therefore the residuals will tend to be somewhat reduced, or X( Y — Y') 2 
will tend to be less than 22(F — Y) 2 . It can be shown that S(F — F') 2 / 
(N - 2) gives an unbiased estimate of the population variance when no 
correlation exists in the population. 

That the estimate based on the regression sum of squares, X( Y' — F) 2 , 
divided by clf= 1, is also an unbiased estimate of the same population 
variance may not seem plausible, nor is it easily explained in an elementary 
treatment. For any sample, 2(1"- 7) 2 equals the difference between 
22(7 Y) and 2(7- 7') 2 ^and it can be demonstrated that on the 
average the value of 22( Y - Yf - X(Y - Tf will equal 22( Y - Yfl 

F “ j. l -l the mean Value of Y ' X) 2 for successive samples will 
be 22( 7 — 7) /(TV - 1). Since the latter is an unbiased estimate of the 
population variance, it follows that 2( 7' - Yfj\ must be an estimate of 
the same variance. 

Of the three variance estimates, only the estimates based on residuals 
and on regression are independent. The sampling distribution of their 
ratio is that of F. Let s 2 r stand for the estimate based on the residual sum 
of squares and s 2 v stand for the estimate based on predictions by a linear 
regression function. Then, if s 2 Js 2 r , with n x = 1 and n 2 = N — 2 falls at 
or beyond the .01 level of significance, the null hypothesis becomes suspect. 
This means that the x 2 ^ estimate is larger than expected on the basis of 
sampling, from which it may be inferred that regression is a real source of 
variation in S(F- F) 2 , i.e., that the slope of the regression for the 
population is not zero, or that some correlation exists. 

We have already noted that 

2(7'- 7) 2 = Nr 2 S\ 

Since 2(7— 7) 2 divided by TV equals the error of estimate variance 
previously proved to equal S\(l - r 2 ), it follows readily that 

2(7- 7') 2 = iV(l -r*)S\ 

Accordingly 

- S 2 _ Nr 2 S% 

H *11 

and 


Therefore 


S 2 r = - Y') 2 = TV(1 - r 2 )S 2 „ 

TV - 2 TV - 2 


F _ Nr*S\ll 

TV(1 - r*)S\/(N - 2) (1 - r 2 )/(TV - 2) 

which is the square of the /, formula (10.2), for testing the significance of r. 
Thus, again we have F = t 2 , when / 7 X = 1. 
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The reader will have noted that, since the required sums of squares and 
the resulting F can readily be expressed in terms of r, there is no need to 
worry further about a computational scheme for securing the sums of 
squares. The easier thing to do is simply to compute r. After that is done, 
either the F or the t test may be used for judging whether the correlation 
is significant. This discussion of the linear correlation problem here should 
help the student appreciate the generality of the analysis of variance 
technique and should also provide him with relevant concepts for under¬ 
standing the test for curvilinearity of regression, to which we now turn. 


TESTING LINEARITY OF REGRESSION 

We have seen that the correlation ratio is a general measure of the 
degree of correlation and that r measures the degree of linear relationship. 
Even though the regression of Y on X for a population be exactly linear, 
it will be found for a sample that the means of the arrays will show some 
deviation from a straight line; hence as previously pointed out, the correla¬ 
tion ratio will tend to be larger than r. How large should the difference 
between r\ and r be before we suspect nonlinearity, or how much can the 
array means deviate from a straight line by chance? Before the develop¬ 
ment of the analysis of variance technique, the inadequate Blakeman 
criterion was used to answer the foregoing. In presenting the currently 
accepted method, we shall carry the argument through on the basis of the 
regression of Y on X. 

Imagine a scatter diagram with regression line drawn and the array 
mean located in each vertical array. For a score in the gth array, the 
deviation of Ffrom F can be thought of in terms of its deviation froni the 
array mean, Y g , plus the deviation of the array mean from the predicted 
value, Y' g , plus the deviation of the predicted value from the total mean. 
In symbols, 

(F _ 7)= (F- Yg) + (Y g — Tg) + (Y'g - 7) 

Squaring and summing for the m g cases in each array and then summing 
over all G arrays (equivalent to summing over all groups), we have 

2£(F - F) 2 = SS(F - Y g f + S m g (Y g - Y' g ) 2 + S m g {Y r g - F) 2 

the cross-product terms having vanished because the component parts are 
uncorrelated. 

The first component is a sum of squares based on within-array variation 
with N — G degrees of freedom. We encountered this in checking the 
significance of the correlation ratio, and we then labeled as s 2 w the variance 
estimate based thereon. 
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The second sum involves deviations of array means from linear regres¬ 
sion. Its df will be G — 2 since there are G means and two restrictive 
constants in Y' g . If G = 2, the two means cannot vary from the fitted line. 
Let us use s 2 d as a symbol for the variance estimate based on this sum of 
squares. 

The third sum, which has to do with the part of the total variance 
predictable by means of linear regression, is very similar to that occurring 
a few pages earlier in connection with the F test of the correlation coeffi¬ 
cient. It differs only in that the same value is predicted for all cases within 
an array regardless of their location in the X interval defining the array. 
This is equivalent to a linear prediction of the mean of the array. Actually, 
the numerical value of 2( Y’ — Yf as calculated by Nr 2 S 2 y , which equals 
r 2 £2( Y — Y) 2 , will be the same as 2 m g ( Y' g — Yf computed directly, 

provided r was originally determined from a scatter diagram with the same 
intervals now being used to define the arrays. We have already seen that 
the df for this sum is 1 , and we have used s 2 v as a symbol for the estimate 
based thereon. 

It will be recalled that, in the scheme for testing the significance of the 
correlation ratio, the total sum of squares was broken down into a within- 
array and a between-array part. We now have a breakdown into within 
array (as before) plus two additional parts—the sum 2 m g ( Y g - F ) 2 is 
broken into _ 9 

Xm g (Y g - y'„)* + 2m„(y'„ - Yf 

v g 

It will also be recalled that 


and that 


Sm/r,- r) 2 = ^S2(y- Yf 

2 m g {Y’ g - Yf = r 2 S2(T- Yf 


By subtraction, we see that the new sum, 2 m g ( Y g - Y' g f, is equivalent 
to ( r i f - r 2 ) 22 (T — y) 2 . 9 

For convenience, we shall now assemble in an analysis of variance table 
the several symbolic expressions having to do with testing the significance 
of ( 1 ) the correlation ratio, ( 2 ) the linear regression coefficient, and ( 3 ) 
nonlinearity of regression. Table 15.4 gives the sources of variation, the 
sums of squares and their equivalents in terms of r or 77 , the degrees of 
freedom, and a symbol for each of the variance estimates. Note, in review, 
that for the sums of squares, their equivalents, and the dfs, the following 
additions hold true: 

(a) + (b) = ( c ) 

(*) + (*) = (/) 

(c) + (d) = (/) 

(a) + (b) + (d) = (/) 
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Table 15.4. Analysis of variance functions for bivariate correlation 

Source of 

Variation Sum of Squares Equivalent 


df 
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Esti¬ 

mate 


(a) Linear 

■ZmfY',- F) 2 =r 2 2£(F- Yf 

1 


regression 

Sp 

(, b ) Deviation 
of means 
from line 

£ mf F„ - Y' g f = (») 2 - r 2 )SS( Y - F) 2 

G - 2 

5 £? 

(c) Between- 
array means 

Sm 9 (F„- F) 2 = ^S2(T- F) 2 

G - 1 

s b 

(d) Within 
arrays 

ss(f - F„) 2 = (l — ij 2 )2S(y — F) 2 

N-G 


(<?) Residual 
from line 

££(T- Y',f=(\ -r 2 )S2(T- F) 2 

a 

N -2 

A 

(f) Total 

ss(y- F) 2 

N - 1 



The several useful and permissible Fs, or ratios of independent and 
unbiased variance estimates, along with the proper dfs Or and » 2 values) 
for entering the table of F, may be stated in summary form: 


A 


ri! = G — 1, n 2 = N — G: significance of correlation 


A = A /* 2 


«i = l, 


n 2 = N — 2: significance of linear 
correlation 


p _ s 2 d js 2 w ; n x = G - 2, n 2 = N - G: significance of curvilinearity 

We have already discussed the first two of these Fs. If we write the third 
in terms of sums and dfs, we have 

A 2 m g (Y g — Y' a fl{G — 2) 

F3 = = 2 2(7- Y g fl( N - G) 

9 

fa 2 - r 2 )£S(F- Yfl(G - 2) 

~ (1 - ifyZZ(Y— Y) 2 I(N -G) 

_ (t) 2 — r 2 )/(G — 2) 

(1 — if)l(N — G) 

which indicates definitely that its value, for given dfs, is a reflection of the 
difference between the correlation ratio and the correlation coefficient. 
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erefore, m testing the significance of the variation of array means from 

„ and CnTf h T are , te f n « the significance of the difference between 

regresstof for h * , ' 01 Probability level, the hypothesis of linear 

regression for the populatton being sampled is rejected. When this 

funS’ for t 0 " ^ 00nehtion a » d a i-arTegr^on 

[“iolfp” aW ' 0Pri,,e ~ “ ““ ” “"S 

ratio for Inn’y “ tCSting ^ si « nificance °f the correlation 

ratio for X on Y and the linearity of the horizontal array means, the analy¬ 
sis is carried through with Xs substituted for Ys. Since the number of 

mTv differ 11 / 6 ^ * ^ ^ **** need ” 0t be the same ’ the value of G 

may differ for the two analyses. 


ILLUSTRATIVE PROBLEM: r, yj, AND CURVILINEARITY 

tZ h J {0Kg °u g -n hree teStS ° f ‘significance and the computations necessary 
thereto may be illustrated by the data of Table 15.5, which gives the bi- 
vanate distribution for the relationship between initial (sum of scores on 


Table 15.5. Bivariate scatter for initial and final scores of 92 boys on Koerth 
_ pursuit rotor 


Y= Final 
Score Code 



JT 

= Initial Score 




fv 

0 

30 

60 

90 

120 

150 

180 

210 

740 

11 










700 

10 


1 

2 






1 

660 

620 

580 

9 

8 

7 

1 

2 

3 

1 

8 

3 

1 

2 

7 

4 

2 

* 

1 

3 

2 


2 

1 

1 

2 

2 

9 

13 

17 

540 

6 

2 

8 

5 


1 



1 

16 

500 

5 

2 

5 

3 






15 

460 

4 

3 

1 







11 

420 

3 

2 





1 


4 

380 

2 









2 

340 

1 

3 









300 

0 

1 








3 

1 

fx = rn g 

sy 

2F 2 

(2 Y) 2 jm g 

19 

89 

547 

416.89 

27 

181 

1269 

1213.37 

20 

139 

1007 

966.05 

10 

85 

747 

722.50 

7 

60 

520 

514.29 

0 

0 

0 

0 

4 

37 

345 

342.25 

5 

45 

411 

405.00 

92 — N 
636 

4846 

4580.35 


trials 1-4) and final (trials 67-70) performance on the Koerth pursuit 
initial s SmCe In ° S1Ca t0 be concerned with the prediction of fina^ from 

r’o^ ° f ^ ™ Sha « b * ^ ^ 
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In the first place, the correlation coefficient is computed from the scatter 
diagram by the method given in Chapter 8. Its value of .5687 is about .01 
lower than the coefficient computed from a scatter with twice as many 
intervals. The use of so few intervals for the X variable would obviously 
not be recommended for the computation of r, but in this illustration it is 
convenient because of page-space limitations. There is the additional 
consideration that for computing the correlation ratio we should avoid 
having too few cases per array, which if the sample is small may mean only 
a few intervals on the independent variable. At least twelve intervals should 
be used for the dependent variable. In checking on linearity, it is necessary 
that we calculate r from a scatter with the same grouping intervals used m 
computing ri, and no corrections for grouping error are needed. 

For the computation of the correlation ratio and for the testing of its 
significance, we need the within arrays, the between arrays, and the total 
sum of squares. These may be computed from coded scores (deviations 
from an arbitrary origin in terms of step intervals), and the entire analysis 
may be carried through on the basis of coded scores, so that cumber- 
somely large figures are avoided. The reader who wishes to follow the 
computational procedure will need to note the following features of Table 
15.5. The marginal frequencies on the right are for all the 7 scores, and 
the f s along the bottom margin are the m g s, or cases per array. For each 
vertical array and for the right-hand margin, £7 and £7 2 are computed 
in terms of coded values (these correspond to £*/ and £^ 2 of Chapter 3). 
Summing across the £7 and £7 2 rows should yield the 2 7 and £ 7 
obtained from the marginal distribution. For this problem, ££ 7 == 636 
and ££ 7 2 = 4846. The last row, containing the several values of (£ 7) jm g , 

is summed across for the needed , which is 4580.35 in this example. 

There is no check on this figure by calculations based on the margin. 

In order to get the sums of squares of deviations, the values 636, 4846, 
and 4580.35 are substituted in formulas (15.9-15.11) with X replaced by 7. 

Yf = 4846 - = 449.30 

7 £(7 — Yg) 2 = 4846 - 4580.35 = 265.65 

9 

S m a {Y a - Yf = 4580.35 - ^ = 183.65 

g 

By formula (15.13) we now obtain 

2 = 18165 = 40874 . n = ,639 
1 449.30 

which is the correlation ratio for 7 on X. 
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The other sums of squares called for in schematic Table 15.4 may be 
calculated^from^their equivalents in terms of r« and/or if. Note that 

? ~ Y f = (.32342X449.30) = 145.31 

S2(r- Y',f = (1 - .32342X449.30) = 303.99 
? Y o ~ Y ' s ) 2 = (-40874 - .32342X449.30) = 38.34 


Table 15.6. Analysis of variance table for regression of final (V) on initial score 

for data of Table 15.5 


Source 


Sum of Squares df Variance Estimate 


Linear regression 

145.31 

Deviation of means from line 

38.34 

Between-array means 

183.65 

Within arrays 

265.65 

Residual from line 

303.99 

Total 

449.30 


1 145.31 = s*„ 

5 7.67 = s 2 d 

6 30.61 =s\ 

85 3.13 =s*„ 

90 3.38 = s? r 

91 


The several sums of squares and their respective degrees of freedom are set 
lorth in Table 15.6, which contains also the variance estimates obtained 

by dividing the sums of squares by their dfa. From these variance estimates 
we have the following. 

the sl S nificance of the correlation ratio we have F, 

The nm t , = r 9 ' 8 ’ ^ f ° r ni = 6 and = 85 is hi § hl y significant, 
lne .001 level of significance requires an F of about 4.0. 

-^45 im",! the * i 8 nificance of linear correlation, i.e„ r, we have F 2 
■ , 45 ' 3 Z 3 - 38 _ 43 -°> whlch for «i = 1 and n 2 = 90 is likewise highly 

significant, the .001 level being at an F of about 11.6 5 y 

For testing linearity of regression, i.e., the departure of the array means 
from a straight line, we have F 3 = 7.67/3.13 = 2.5, which for = 5 and 
th 85 is near the .05 level of significance. Thus the apparent departure 
from linearity in Table 15.5 is not sufficiently great to lead to rejection of 
e hypothesis of linearity; we would, however, question the hypothesis, 
is is an example of borderline significance which calls for drawing 
another sample or adding more cases before we set forth a conclusion 
For the problem at hand, a second sample of 90 boys yields a scatter 
lagram much like that of Table 15.5, so we would reject the hypothesis of 
linearity of regression. 


*i, T 4 e fl t “ dent Sh ° Uld keep in mind that the test for linea rity can lead to 
the definite conclusion that the regression is curvilinear (if F is large 
enough), whereas a low F does not prove linearity. Why? 
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If the hypothesis of linearity is disproved, it follows that the correlation 
coeffident is not a suitable figure for describing the relationship. The 
correlation ratio can be used to describe the degree of association, but th 
form of the relationship should be described by a fitted curve or by a verbal 
"description of the general curve tendency of the array means. Some readers 
till E noted that the correlation ratio cannot be considered very descrip¬ 
tive of the data of Table 15.5 because of heteroscedasticity. 

APPLICATION TO MULTIPLE CORRELATION 

The reader may recall that the methods given in Chapter 11 for judging 
J,Seance of the multiple correlation coefficient involved unsat - 
fact ^^approximations. Insofar as we are interested m testing the devia¬ 
tion of a multiple r from zero, the analysis of variance technique provides 

an exact test which is applicable when the sample « sma » ^ 

Let us suppose that 7 is a dependent variable which is to be predicted 

“ u l.iple regression Ration eon»i»i«g »' 

designated by Xs. The prediction equation may be written as 

Y' = A + B x X x + B 2 X 2 + • ' ‘ + B m X m 

in which the Bs are the regression coefficients. The deviation of any 
• ^ irlnnl’s Y score from the mean Y can be expressed as the sum of two 
mrts: the deviation of his 7 from his predicted value plus the deviation 
of the predicted value from the mean of the Ts, thus, 

(Y - F) = (T- n + <r- 

If we square both sides and sum over all cases, we have^ 

22(7 - F) 2 = 2(7 - Y'f + VT' ~ L) 2 

which is exactly analogous to the breakdown used in connection with the 
test of the linear correction coefficient. One part has to do with residuals 
about the regression plane, the other with variations m the predicted values. 

SS- .erm .pin vaninhes-i, c.„ to shown ,1,.. .tore a no 

correlation between residuals and predicted values. 

As previously, we label the 2(7 - 7') 2 as the residual sum of squares 
and 2( 7 - 7) 2 as the regression sum of squares. The total sum of squares 
nf .nurse have N - 1 degrees of freedom. The residual sum of 
squares will los’e df s according to the number of constants in the regression 
equation We have the constant A, and the number of B constants ,s m 
l q TJf'JN - (m + 1) = N - m - 1 for the residual term. The reader 
who dfes not immediately see the reasonableness of this should consider 
The case of one dependent and two independent variables with varying 
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scores on N = 3 cases. Imagine that the three scores for each case can be 
ed to locate a point for each in three-dimensional space, and then think 
ing an ordinary plane to these three points. Obviously the plane can 
be made to pass through all three; hence the prediction would be perfect 
and here wou'd be no freedom for any of the three points to vary from the 

the u V n % ati ° n a ” variables), 

tha^^ 

depend on the slopes of the regression plane or on the Bs There SinTm 

f it ^1,T m TT “ WhiCh ^ SUm Var ^ therefore df l ^Z 

df = Tfor tesdn 2 1he a r eXtenSi ° n ,° f the ar §“ used to explain why 
7 tor testing the linear correlatton coefficient. If our ^determin/ 
tons are correct we should have (AT - * - 1) + m { 

which is seen to be the case. S 

cn^edS » “ PO “' M mu,,i P le cortkfau 


forTu 10 !! S 12 J''' re P resents the residual variance and S\ is the variance 
for the dependent variable. Since the residual variance plus the predicted 
variance adds to the total, the multiple r can also be expressed^as the aS 
of the predicted to the total variance. (Note that we are here speakinTrf 

-nS th°e “jTd* 0 By definition ’ the residua! varian ^ e is kr 
is Sfr’-Twt w e Vanan Tu iSS / y '“ the total variance 

t y Y)IN ■ We ma y therefore write the multiple correlation 
coefficient, using R in order to avoid subscripts, as 0 ” 

_ , 2(7- rf/N 

22 (T— Yf/N 

from which it is readily seen that 

2(T- Y'f = (1 - f? 2 )2X(7- F) 2 

From the alternative way of regarding multiple correlation, we have 

pa_ 2(7' - Yf/N 
22 ( 7 - Yf/N 

which leads to 2( Y' — Y ) 2 = Y — F) 2 

Thus the sums of squares have their equivalents in terms of R and 

consequenily toy „,y „ „ mpoW w ’ y of * Jhe 
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Table 15.7. As & . , t u effect that the estimate based 

coefficient, we set the null hypo es from thaj . based on t he residual 

on the regression sum o squares w Tbe nub hypothesis implies 

sum only because of chance samp g • corre i at i 0 n of the depen- 
that, if the entire population were measu , w hen a 

dent variable with each independent variable would be zero, ino , 

Table 15.7. Variance setup for testing significance of multiple 
correlation coefficient 

Esti- 

Sum of jr mate 

Source Squares Equivalent _/__ 


Regression 
Residual 
Total 


ar - F) 2 = R 2 ss ( r - Y) 2 _ m 

i y - r 2 - (i - « 2 ) ss l 7 - 7)2 * ~ " 1 

wy-v? N ~ l 


mmpl. i. rtawn from ..oh « populatto„.Or=™wUWa S ™oreo^rrom 
dom. Note that 

S(Y - Yfl*n _ 

= 2(Y - Y'fl(N - m- 1) 

R 2 SS(Y - Y) 2 lm _ 

= (iVLR 2 )SS(Y- W(N - m - 1) 

R 2 /m 


F = 


(1 __ r 2 )I(N - m - 1) 
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formula ( 11 . 7 ) Best or unhide a • s 111 P iace of actual variances in 
provide an unbiased estiT^th ^ ^ t0 an Unbiased *. °r 
01.14) gives ValUe ° f * F ° rmUla 

except when N is small or when m ’is 1 ^ 1 . m P rovement ls negligible 

stressed that neither the ’analytis of vaa *%"?'™ ‘° * Jt should be 

nor the improved estimate of* allows for'the fatcj toy ff Cance ofR 
correlation work when from amona a i , acy lnv °Ived m multiple 

chosen for inclusion in theTnaZ^c ^° f VariabIes a few are 
criterion. Such selection tend ^ canb*r " Sh0W C ° rre,ati °" with the 
highest partly because o? chance erroTs °" " “ are amo "S tbe 

whtCScTuS" o1 C adSoTf whe " wonder 

equation leads to a significant ^r ^ T *" the multi P Ie ^-ssion 
when we wish to know 8 whether the drone,"n tle f aCCUracy of Prediction or 
a significant decrease in the amount f PP 8 ° f certain variables results in 
of additional variable^the Z2 ° The invasion 

of estimate somewhat and leads to an inc^Si*"^ 0 .[ e , dUCe j he error 
increase in * possesses statistical significance ? that the 

value based on^TadlbkTsekc"ted }"' ldependent variabl es and R, be the 

“»"strr b ' To « 

F = — m„) 

0 - R\)!(N - „ h _ 1} 

* - ”* - ;■ a. .o, p*,,. 

or variables possesses statisticaTs^n'ifiMnce" US1 " S ^ additi ° nal variabIe 


INTRACLASS CORRELATION 


Of a"reS„1«” Lr^L d v 1™i7“*‘"“ ™»> in 

and if we attempt to make a scatter H • reme nts on just one variable, 
Of deciding which member of a pli r aTJ * ^ ^ ^ pr ° blem 
which to the other This can he re i a u A ’ }° assign to °ne axis and 

“5 * i " sl ' 

m,hes (0r S rou P s or classes) with m cases per 
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+ (in 


S l T V'" ^ W 

and for within families. If „/ » within-family variation, r' 

significant positive r Note tha .d Jthere «no ^ ^ ^ 

becomes unity. Note also tha y 2 one j s se ldom confronted with 

the necessity for trying to p g avera ge of the m„ values 

This does not affect 

; S he US ; d test ITway of judging the is 

The ^S“?sfoS "S- way of orclering 
that we have G sets of scoresi on ju bm } It is obvio us that r' 

t r=sreSiys »*■&- - ■» *• ^ 

have been defined. 

SELECTED CONTRASTS 

When and ..1, -hen * a. a. SS 

encesaipongtheGgtoupsmay wesa^ey ^ desjgnated as a type D 
two selected means diff f nrt1 , ea v era ge 0 ftwo or more group means) 
contrast) or whether one mean (or t g mea ns (character . 

differs more than chance from the averag^ v eed t0 distinguish 
ized here as a type D tests: 

between two an a ® riori hypo thesis calls for examining 

we may wish to do ^ because a F „ w ^ ffiay ^ ^ make certam 

a given contrast, or as t formeI% a (test is appropriate 

comparisons suggested by h ^ ^ misleading in that such sn0 oping is 

whereas for die alter h y d . fferences that are the largest, a 

d “ with a resu,ta 

vitiating of the level of significance. either a D or a D' or both 

Regardless of motivation, we ave Q = 5 groups? with means 

and an awropnam smnd^, ^ ^ ^ ^ = * _ ^ or 

Xi, X »X»> * | )(3 _ (J+ Xb)/ 2, but with unequal m, the value of 

?■ dklL weighted ...rages, *«■ 

m. X, + m*X a + m 4 x 4 _ mA+jn^Q 
D _ m x + m 3 + m 4 m a + m 5 
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For the sampling error variances we have 


A D — s . 


m , mJ 


s 2 — c 2 
b D' ~ s i 


If all m g = m, the latter simplifies to 


w i + «3 + m 4 m 2 + m s 


s a ,/i ! 

-( - + - I or as— - + - 

m ^ 2/ m \<z b 


the Ia U t2yielSs Sr ° UP meanS b f g Note 

standard errors or , ’ and T Z „ B T Sp6ClaI Case ' The re q uired 
roots of the forgoing vartces ’ "* ^ ^ the "P™ 

For the a priori hypothesis situation, we have D/sr, and D'U *e , + - 

gg&SSSSSSS 

quire equal m gi because it can be used for both Z) and n' ^ 

sr.-sft» 

IS. iitl.."rfoTS-T." b ',r ried “ ,i8 “ a “"" “» 

and when equality of variance bids X !ad V** J* VaHd ^ for ^ *». 
discussion by Scheffe of his 5-method and the Tukey Xethod t H ScheT^ y ' t0 T ad 
of variance , New York: John Wiley and Sons, 1959. ethod m H - Scheffe > The analysis 
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Apparently this method is the best yet devised for contrasts or com¬ 
parisons of the D’ type but for those of the D type it is lacking somewha 
in sensitivity. However, along with this lack of power we have the advan¬ 
tages listed earlier plus the satisfaction of knowing that its usage guar s 
against the making of the type I error too frequently when testing differ¬ 
ences suggested by the data. The error rate, using the ordinary t test for 
ench comparisons, increases astonishingly as G increases. 





Chapter 16 

ANALYSIS OF VARIANCE: 

COMPLEX 


In Chapter 15 an explanation of the fundamental idea of the analysis of 
variance technique was attempted, and applications to relatively sample 
situations were given. In general, these situations involved the testin/of 
he significance of the over-all variation of the means for sever^-s 
he groups differing on the basis of a single classificatory principle Such 
setups are sometimes referred to as ^/variable experimem by whict 

XX i ““ ! : ~ k “” "■<■»Xp,X.nt 

wWh a , b ° r exam P ,e ’ lncome might be considered a variable 
which is dependent m part on amount of education, which accordingly 
becomes the independent, single variable for classifying individuals into 
groups. Or it might be that the classificatory variabfe Subject to 

will leadT” 1 * 511 ^ tl0n ’ and We WISb t0 determine whether variations thereof 
lead to performance or response differences. The Wright experiment 
cited in Chapter 15 is an example of this. 8 experiment 

There are times when it is not only feasible but advisable to design the 

experimental setup so as to make one set of data serve for the testing of 

hypotheses regarding the separate influence of two or more independent 

* , Thl , S ^P 6 of ‘ hin S has bee n done for a long time in psychologi- 

wlv rh Wh r n lt has been p0SsibIe t0 classif y a total groupers! ole 
way, then another, and perhaps a third way. For example, d order to 

clas^fv’” 6 S ° me °l‘| e m° SSibIe correIates measured intelligence, we may 
classify a group of children into urban, suburban, and rural groups* the/ 

«™i7,X b t £ rr s '. we ”>■ ***> *>» 

bv alf S ,/h ; th ® daSSlfiCatl0n ma y be b y sex ° r by grade location or 
by age. Such a procedure m which one variable is considered at a time is 

288 
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tantamount to the single variable setup, even though the same batch of 
data is made to answer questions about the effects of different independent 
variables. 

Now it is obvious that, in studying factors associated with intelligence, 
we could make a double classification by classifying our cases simul¬ 
taneously on two of the variables, or a triple classification by using three 
variables, etc. Consider for the moment a double classification based on 
the three rural-urban categories and on sex. This would lead to the assign¬ 
ing of the cases to six groups, each of which would have a mean IQ. 
Instead of having three means for groupings on the basis of the rural-urban 
characteristic, we would now have two sets of such means, one set for 
each sex. Instead of two means for the total group classified by sex, we 
would have three sets of sex means, a set for each of the three residence 
categories. 

This type of breakdown and similar ones where percentages instead ot 
means are involved were utilized in psychological research long before the 
advent of the analysis of variance technique. The further breakdown of 
each sex group for residence status (or of residence groups for sex) is made 
in order to see whether rural-urban differences hold for the sexes separately 
(or whether the sex differences are similar for each of the separate residence 
groups). Although researchers were not confined to the single variable 
approach before the invention of the variance technique, they were defi¬ 
nitely limited in the possible statistical treatment of their data. Now that 
we have the analvsis of variance method, we have an adequate statistical 
technique for checking such hypotheses as can be formulated concerning 
the influence of not only one but two or more variables. The advantages of 
using analysis of variance for such situations may be briefly mentioned. 

First, as we have already seen, it provides an over-all test of the signifi¬ 
cance of the difference between two or more means when either large or 
small samples are involved. 

Second, we shall soon see that it leads to a definitely improved estimate 
of sampling error when double or triple or higher-order classification is 
involved. For instance, when the older method is used to check the 
significance of the difference between the two sex means for the total 
group, the determination of the sampling error makes no allowance for 
likely heterogeneity in intelligence associated with residence status. The 
variance method permits a refined estimate of error by allowing for varia¬ 
tion due to one or more variables when the differences between groups 
classified on the basis of some other variable are being tested. 

Third, the variance technique provides a means of testing whether the 
influence of one independent variable on the dependent variable is similar 
for subgroups formed on the basis of a second independent variable. In a 
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sex-by-residence analysis of IQs, the breakdown of each residence group 
by sex will likely show that the sex differences are not exactly the same for 
t e t lee groups and that rural-suburban-urban differences are not exactly 
alike for the separate sex groups. Such inconsistencies as seem apparent 
from examination of the six cell means may not be real for the simple 
reason that random sampling errors are present. Before the development 
of the variance technique there was no way of testing such apparent 
inconsistencies, except when each classificatory characteristic led to fust 
two categories. J 

This last point has to do with what has been termed interaction , a concept 
which is not easily understood. Rather than provide a detailed discussion 
now of what is meant by interaction, we will give a simple illustration. 
Suppose it has been found that one learning method has a distinct advan¬ 
tage over a second method, but that, when the data are broken down for 
two recall intervals, the superiority of the first method seems to hold only 
for those with the shorter recall interval. This failure of the first method 
to be consistently better becomes an example of interaction. Before 
concluding that there is evidence for real interaction, we need to apply 
a statistical test. For such a simple breakdown, we could compute the 
difference between the first and second method means, and the standard 
error of the difference, for those with the short recall interval; likewise 
for those with the long interval; then we could determine the difference 
between the differences and its standard error and therefrom obtain either 
a z or a t as a test of inconsistency. But, when we think of a situation with 
three methods and three or four recall intervals, it is immediately obvious 
that such a simple test cannot be applied. 

It is the purpose of this chapter to present the methods of analysis to be 
used when classification into groups is made on the basis of two or more 
variables. These extensions, which are somewhat restricted by the under¬ 
lying assumptions of normality and homogeneity of certain variances, are 
applicable for either large or small samples and are particularly helpful 
with small samples when it seems imperative that we “get the most out of 
the available data.” 


DOUBLE OR TWO-WAY CLASSIFICATION 

Suppose that the individuals (or their scores) are classifiable into C 
groups on the basis of one characteristic or variable and into R groups on 
the basis of a second variable. This would lead to a table with RC cells. 
Let us first examine the setup where we have only RC scores, i.e., one score 
for each cell. It is convenient to let X„ stand for the score in the rth row 
and cth column of such a table. The score in the first row (from the top) 
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and third column would be symbolized as X 13 . The general patte 
of labeling the scores is set forth in Table 16.1, which also includes along 
the margins a symbol for the several possible row and column mea . 
Note that the first subscript identifies the row and the second the column 
to which a score belongs. The scheme used in denoting means should 
be grasped. Thus X, 2 is the mean for the second column, whereas V-« the 
mean for the second row. The “dot” in the subscript indicates the direc¬ 
tion of the summing for computing a mean—to get X. a we sum X r2 sco 
with r taking on values running from 1 to R. 


Table 16.1. Schema for labeling scores and means for groups, 
double classification 



1 

2 

3 

c 

c 


1 

*11 

*12 

*13 

*lc 

*1 c 


2 


*22 

*23 

*20 

*20 

*2- 

3 

*31 

*32 

*33 

*30 

*30 

*3’ 

r 

*rl 

* r2 

*r3 

*rc 

X rC 

*r. 

R 

*Rl 

*722 

*123 

*jRc 

Xrc 

Xr- 


*1 

*2 

*8 

K 

*•0 

X 


The deviation of any score, X«, from the total mean can be expressed 
in terms of the deviation of its row mean from the total mean, (X r . X), 
plus the deviation of its column mean from the total mean, (X. c _ ), 

Jlus a sort of remainder term which represents an individual variation 
over and above that due to the groups to which the score belongs. To 
secure an expression for this term, we note that by definition the-term 
must be the part of the score deviation (from the total mean) left over after 
the sum of the two parts specified above have been subtracted. Accord- 

ingly, we have __ _ 

8 y (X rc - X) - [(X r . - X) + (X.. - X)] 

which simplifies to 


(X« - X,.. - X. c + X) 
We may therefore write the following identity: 
(X rc - X) = (X, - X) + (X.„ - X) + (V 


X,. 


x. c + X) 


With r running from 1 to R, and c taking values from 1 to C, there will, 
of course, be RC individual deviations. We need the sum of their squares. 
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prodictto-mftliat °' Ve T T*™ ° f ^ three m P luS three cn>ss- 
p oduct teims that can be shown to vanish when summed. It mav be 

up Tunno e T' Z* SUm ° f SqUar6S f ° r a11 RC scores can * set 
up Suppose we begm by wnting the squares of the deviations for scores 

whth fiKt t°uT' EaCh ° fth6Se SqUar6S wiUinvolve «oss-product terms 

^^a^fo^the^rs'bcohrmn^scores^ 3 ^ ^ ‘° ^ 

(La - Xf = (X v - xf + (Z x - xf + (x n _ x v - X. x + Xf + 

(Yi - Xf = (X 2 . - Xf + (Z. x - Xf + (X 21 - Z a . _ + Xf + 

(X n - Xf = (X,. - Xf + (X. x - Xf + (X rl - X r - z x + Xf + 

(X m - Xf = ( X R . - Xf + (x. x - Xf + {X m - X n . - X. x + Xf + 

iW.lve’T" 8 ° f theSe squares 0f deviations for scores of column 1 
involves * cases, i.e., r runs from 1 to R; hence we need a symbol which 

denotes this fact. Let us use Z for this purpose. Note that the second 
rsummmiofsl; “ ““ * "*** ^ replace 

“tr ares ’ and by ana,08y the sums for the 

1st col.: 

2 (X rl -Xf= 2(X,.. - xf + R(X. x - Xf + X(X rl - X, - Zj. + Xf 
2nd col.: r 

2 (X„ - Xf = X(X r . - Xf + R(X 2 - Xf + Z(X r2 - X r . - x 2 + Xf 
ct h col.: 

2 (X„ - Xf = £(*,. - Xf + R(X C - Xf + X(X rc - X r , - Z. + Xf 
Cth col.: 

2 (X rC , - Xf = Z (X r . - Xf + R(x. 0 - Xf 

^(Ac A. x.q + xf 

We may now sum over the C columns, and for the results we will need 

from b rnl Umm f IOn , SlgnS '- Sl " Ce th ® firSt ri S ht - hand term does not vary 

S haldTet off Uml1 ’ “ S *r ^ merdy ° timeS itS Value - The ^ond 
no 8 n f fs f f mS mvolves a constant times a variable; hence the 

foil 1 * R C ° meS fr ° m Undef the summation sign. Finally we have the 
owing expression for the sum of squares for the RC scores: 

22(W„ ; - Xf = C2(X r . - Xf + RX(x. c - Xf 

+ ^(Ac - A- ~ X. e + Xf (16.1) 



[16] ANALYSIS OF VARIANCE: COMPLEX 

The reader who is worried about whether the cross-product terms really 
vanish should note that for the eth column the product term 

2 (X r . - X)(X. C - X) = (X. c - V)£ (X r . - X) 

r r 

vanishes because T,(X r . - X) = 0. The other two cross-product sums 

have as one factor’the remainder or residual term; we have already had 
examples of a general principle that product terms involving residuals 
vanish. 

From formula (16.1) we see that the total sum of squares can be broken 
into three additive components: between row means with R — 1 degrees 
of freedom, between column means with df of C - 1, and a remainder. 
The degrees of freedom for the last part can be ascertained by a principle 
analogous to that used for getting the f df for contingency tables. The 
marginal means constitute restrictions on the deviation score entries in the 
rows and columns—when deviation scores for (R — 1)(C — 1) cells are 
filled in, the rest of the entries become fixed; hence df = (R — 1)(C — 1). 
Note that the dfi for the three parts sum to the df for the total sum of 
squares or RC — 1. 

Dividing the three sums of squares by their dfi leads to three variance 
estimates, s 2 r for that based on rows, s 2 0 for columns, and s 2 , c for that based 
on the remainder, sometimes called error, sum of squares. We have two 
null hypotheses: that the row means are chance variations from one 
population mean, and that the column means are also variations from 
one population mean. As in the simpler situation, if the estimate, based 
on rows is larger than expected on the basis of chance, it follows that there 
are real differences between the population means for the groups defined 
by the rows; likewise, for column means. 

In testing the significance of either of the two between-groups variances 
when the RC scores belong to RC individuals, we use the remainder 
variance estimate as the denominator of the F ratio. This involves an 
assumption, to be discussed under the heading “Choice of error^term,” 
p. 309. For testing the variation of row means, we have F — s\ls 2 rG with 
n ' R _ i and = (R - 1 )(C - 1). For column means, F = s%ls\ c 
with n x = C - 1 and n 2 = (R - 1)(C - 1). If an F so defined happens 
to be less than unity, we know at once without reference to the table for F 
that the variations of the given means are insignificant. Note that, since 
the error variance used in the denominator is a residual after the parts of the 
total associated with between-row and betweeri-column variations have 
been subtracted, it follows that we are using as our error term a variance 
which has been freed of the influence of heterogeneity with respect to the 
two classificatory variables being investigated. 
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For many situations involving double classification, it would seem that 
the method just outlined would be definitely limited in usefulness because 
no provision has been made for increasing the size of the sample except 
by using finer grouping on one or both of the independent variables. Finer 
grouping would be possible, though not always feasible or desirable for 
some classificatory variables, such as degree of illumination or amount of 
education or size of type, but for other bases for forming groups there are 
definite limits on the number of groups. For example, in the study of 
reaction time the number of possible groupings for sense modality is 
limited. Actually, the number of cases can be increased by having addi¬ 
tional individuals assigned to each of the RC cells. Before taking up this 
needed modification of the setup, we shall discuss certain specific situations 
where the scheme as presented is of practical use. We are not ignoring the 
possibility that sometimes RC cases are enough for testing hypotheses even 
when both R and C are as small as 4 or 5. 


SIGNIFICANCE OF THE 
CORRELATED MEANS 


DIFFERENCES 


BETWEEN 


Suppose that the RC scores are for R individuals working under C 
different conditions. The mean of a row would be for an individual and 
the mean of a column would be for a specified condition. Let us consider 
the Umitmg case of C = 2. The between-columns sum of squares, 
A (a. c — X)% may be written as 

R ( x i - Xf + R(X 2 - Xf 

which we have already shown (p. 268) reduces to {Rj2)(X., — X. f 
or to a function of the difference between the two means. ’ ’ 

Let us next examine the remainder or error term. Jf we turn back to 
p. 292, where we summed over columns, we readily see that the remainder 
sum can be expressed as 

2 (X rl - X r . - J.j + Xf + S(x r2 - x r . - X. 2 + Xf 

in which the c of formula (16.1) has the explicit values of 1 and 2. Now 
the mean of any row, say the rth, is merely the mean of C = 2 scores; i.e., 
X r . = (X A + X r f/2,&nd the total mean must be the average of the two 
column means, or X = (X A + X.f/2. Making these substitutions, we 

ha \7t± 7 


(x rl ~ 


X n + Vs y , JT.i + X . 2 

— A .i -]--- 


+ s( Z) . 2 _v 1 ±v ? _ J 2 + W±W) 


2 


2 
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[16] 

which simplifies to 


J X (X rl X 


ri 


x. y + x . 2 ) 2 + i s (X r2 - x A - x . 2 + X .]) 2 


These two terms become indentical when we change the signs within the 
second parentheses, which change is permissible since the square of a 
function is the same as the square of its negative, e.g., (a) ( a) . 

Hence we have __ __ 

Now the first parentheses term is the difference between any individual’s 
two scores, say D,„ and the second is the difference between the two 
column means, which difference it will be recalled is the same as the mean 
of the differences, D. We have finally the remainder sum of squares as 
j 2 ( D r - Df, or one-half the sum of the squares of the difference scores 

about the mean difference. 

The F for comparing two column means becomes 

-(X.!- X 2 ) 2 


F = 


1 


1S(X>, 

r 


m 


with »! = 1 and n a = R — 1 


R - 1 
This reduces to 


F = 


(X T - X 2 ) 2 

s(D r - m 


R(R - 1) 

which the reader will recognize as i* for comparing the difference between 
means based on sets of correlated scores with the standard error of the 

mean difference estimated by formula (7.1). 

We have seen in Chapter 6 that in testing the difference between the 
means of correlated scores we can, for the large sample situation determine 
the needed sampling error either from the distribution of differences 
between paired scores or by means of the standard error of the difference 
formula with the correlational term included. The important thing to note 
is that the analysis of variance technique provides a method for tes ing 
the significance of the difference between two or more means based on sets 
of correlated scores. The scores may be correlated either because they are 
based on the same individuals working under C conditions or having C 
trials on some stunt, or because siblings or litter mates are involved 
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(each of the C groups containing one case from each of R families') or 
because we started with R sets of matched individuals, one from each set 
being assigned to the several C groups. 

The F just discussed has to do with column means. What of the row 
means for the given setup? The means of the R rows represent the mean 
performance of each of the several individuals, and a test of the significance 
of the estimate of variance based on the between-row sum of squares 
becomes a test of the significance of individual differences. Since it is 
known that individuals do differ on practically all psychological variables 
S uch a test is usually a trivial test of the obvious, and hence it is seldom 
needed. We may, however, have the situation in which we wonder 
whether individual variation is significant in the light of known measure¬ 
ment or response errors. To this question we now turn. 

RELIABILITY OF MEASUREMENT 

Suppose the scores in each row represent either the performance of an 
individual on different forms of a scale or C measurements for a given 
variable. The column means would be the means for the forms or succes¬ 
sive sets of measurements, and the test of the significance between column 
means would be a test of the difference between the several form means or 
of the difference between the means for the C successive sets of trials For 
form means or for trial means, F = as outlined previously, provides 

an over-all test of the significance of these correlated means. 

2 In order to understand the meaning better in this situation, of F = 
s r /s rc , let us again take the limiting case of C = 2; e.g., two forms of 
a test have been administered to R individuals. Now on page 295 it was 
shown that the remainder sum of squares reduces to JS(Z> r — Df in 

a differe , nce score ’ is expressed as a deviation from the mean of 
the R differences; hence this term is J2</ 2 (cf. p. 80). Since S 2 = Y*d 2 lR 

the term becomes J RS> D . When we recall the expression for the variance 
of difference scores for correlated values (6.6), it is readily seen that the 
remainder sum of squares can be written as 


IS (D r - Df 

r 


(S*! 


+ S\ - 2 r 12 S 1 S 2 ) 


If we make the usual assumption that the two forms have the same 
variance, we can let S\ = S% = S* x . Then noting that r 12 is the form 
versus form reliability coefficient, r xx , we can substitute and get 


R 



[ 16 ] 


ANALYSIS OF VARIANCE! COMPLEX 


297 


where S 2 is the error of measurement variance (see p. 147). When we 

« -s.w s wirs -ss 

relationship between S 2 and s 1 ^rom'i'he 

JtfK^V- , St£k - % -Vjyt 

ingly with S 2 , as the error of measurement variance, it is seen that ^ « ■ 
anunbiased estimate of the error of measurement variance. We label this 

eS “ under the usual assumption of equal form variances, the remain¬ 
der sum of squares and the variance estimate based thereon has to do with 
errors of measurement. The remainder sum of squares as actually com¬ 
puted includes an adjustment for possibly differing form ™ eans J^ d ° eS 
not allow for any difference in form variances. (It will be recalled tha 
S' 2 computed via r„ is also unaffected by a difference n form means.) 
If we have C = 3 or more forms, the remainder term is likewise a base for 
an unbiased estimate of error of measurement variance. When we test the 
difference between row means, we are actually asking whether the mdividua 
differences are significant in light of the variability due to measurement 

^ In'" our earlier discussion of reliability (pp. 145-52), nothing was said 
about unbiased estimates. Admitting that A represents a slight improve- 
ment over the biased S 2 e , we next ask whether the use of unbiased estimat 
leads to an improvement in the estimate of r xx . Reliability is sometimes 
defined by way of formula (10.15), i.e„ = 1 - or in population 

7 _ i „ g 2 t a 2 This definition formula can become a 

» „ r . r - 

If we were to plug in the unbiased estimates, s\ and s * (the latter as 

an estimate of the common form variance but based on the scores from one 
form only), we would have 


tree 


= l - s\ls\ = 


RS\I(R - 1) _ 
RS\/(R - 1) 


1 - 


S 2 


from which we see that, when estimating r„ via variances in the two form 
situatL, it matters not whether we use unbiased or b.ased estimates of 

th ln a some C areas of psychology it is feasible to approach 
problem by way of m repeated measurements on each of N individuals. 
Feasibility^depends on absence of practice or fatigue effects, P e ™‘ t * n 8 
us to ignore the ordering of the measuring—as first, or second, and so . 
The m measurements for each person are averaged; we wish to know 
the reliability of these average scores, the coefficient for which we will 
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estimtedfromthe 3 SCore) - These two variances would need to be 

individuals The meaT ^ r ‘^ Consistln S of m measures on each of 1V 

it is based ol a Ta i^r * perSOn Wil1 have a sam P lin S «ror since 

standardtror of h P " T 81 " 68 ' The SC < Uare of the estimated 
2 / • ... be mean score for the /th individual would be s 2 — 

5 “ Whlch **.<« « the unbiased estimate of the score fsinele not 

average) vanance within individual 1. If we assume thatthk® th 
individual more variance is from7nd“ lo ind “d”, a 

etter estimate of the within pereon variance will be obtained by averan.’ne 
the separate N estimates. But this is exactly the meaning of ^ Le n 25sf 

me1., n nd W d" T ° f " hi " 1»™» '"““I »f »£ .roup? 

Eeforf. Pr<mde " ” 8 ™P-' ” f ” —.) W.S 

““7,1“"”“ ° f V ‘“ nCe f “ ,te “»» ■> 

To secure an estimate o* f(f the variance over persons of their scores as 

we h?ve S ’/ e = n 5 e /I hat F Sin th ‘ h" bCtWeen gr ° UpS eStiraate **» = . 

written as i 2 = tin, ^ i^ bet .^ een P ersons situation, this could be 

Substitutiimln th ’ Wh ! ch wlH take as an unbiased estimate of <r 2 -. 

\ cor”,""^: f »”* ™ "■« f» «» "HUNUty of th% 

fs = i = | _A? 

A/m s* t (}6.2) 

which ir t.,ed on unbiaaed climates, computable by formula, y.eu i„ 

,7 he f ° t r ® goin S was concerned with the reliability of scores X each 

r res °r ^ what ° f " he *<2 

expres d as x i ^ ^ °" e SC ° re per person ’ * can bo 

expressed as I - X t + E and that for N infinitely large we have a 2 - 

A + a\. The reliability of the X scores, S * ~ 

O 9 

_ i & e <j £ 

xx (pod) A ~~~ = • 


An estimate of r xxWov) can be obtained by replacing <r 2 and <r 2 bv their best 
available estimates. With m measures on each of § fV persons we sS have 

Jan n d 0 om S m 0 o r deT T Z " ea “ nt) a one - wa y ^nalysfs of variance setup, 
random model. The within groups estimate, , 2 W , as a within persons 
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n. O . 2 

by 


s 2 a 2 + m<r : 




SV-^ff 2 . + l»»w 


it, which * a. .he population t,c,„ for f‘"dividual « »« 

of m measures when m is infinitely large, tha , \i % * 

an it will berecalled that when we say the expected value average 1 

A b t ° e ' t 

o 2 e , we have 


2 S hi S 
(7 t = 


m 


■5 i;i 

and the estimate of a 2 e , given by 5 w , into o' * t e 
stitutions lead to 


a „ = 


2 c 2 

S ft. — s w . 2 


s\. - S 2 ,„ + ms 2 , 


- + s » = 


m 


m 


which can be written as 


(T 2 = 


s\, + (rn - l)s s 
m 


Then substituting the above estimates into r xximv) o tl a x we have 
estimate 


p 2 _ 5 ^ 

S bj a 'i 

m 


s 2 - s 2 

Oh; 3 1 


s 2 „. + (m - l)s 2 w s\ ( + (m - l)s‘ 


(16.3) 


m 


/ d“o„ p 285 wTs r', in the station involving » measures 

^ each ofTpenons, is a measure of the reliability of the X scores, when 

ta Ea n rlier g (p: 208) we derived the generalized Brown-Spearman formula 
as a coefficLt for the reliability of a sum (or average) of m scores. 


^XX 


mr 


XX 


1 + (m - l)r a 
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Let us substitute r ra from equation (16.3), with standing for , 


m 


c 2 _ o2 

S w 




s\ + (in - 1 >2 


1 + (m — 1) 


s 2 

A b 


s ‘~ij + (m — l)s : 


m 


s\ 


— s‘ 


s ‘» + (m - 1 )s\ 


+ (m -l)s\ + (m - 1)A - tm - l) s ; 


ms 2 „ 


mv 


s b + (m — l)s s 


ms‘ 


= 1 _ 


$ V' Brown-Spearman formal, tau give , „ y 

individuals variance is evidence for ° ws tila t a S1 g nifican t between- 
we cannot conclude from th"s that the Tstf But 
factory reliability since coefficients as low as aoT^O 6 " 1 satis ' 

statistically different from zero if is snffi • ' o , ' 3 ° ° r ® Ven ' 0 can be 
not recommend this aonLrh 1 ,h en% Iar § e - The author does 

“ r i,we in ,he 

the error associated w th die samvZTT- Tl ^ SOmetiraes 

•*» >™ •. ».»m« “Vi p ””T’f - 

sources of variation are unknown the ml 1 frequently if the 

■=—»■“ r “ * 

variance to be used as the denominator of the l ado fhll ^ 
be any one of or a combmation of the many types of er’ror InThis sens! 
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the variance estimate based on the remainder sum of squares may be the 
error variance even for those situations where we have classifications into 
R groups rather than as R individuals, but as will presently be seen the 
term which we are now calling the remainder may not always be the one 
to utilize as “error.” The within-groups variance estimate of Chapter 15 
was an “error” variance for testing the significance of the between-groups 
variation. In more complex setups in the analysis of variance, rules are 
required for choosing the appropriate error term. 


COMPUTATIONAL ILLUSTRATION 

The required computations for testing variation between column means 
and between row means will now be set forth. It makes no difference in 
the computational procedure whether we have RC individuals classified 
into R groups one way and C groups another way or R individuals with 
C scores each or R sets of C individuals matched or RC scoies for just one 
individual. 

The computation of the required sums of squares involves an extension 
of formulas (15.6), (15.7), and (15.8): 


SS(X rc - U ) 2 = — 

r c i\U 


RC £ £ X 2 


ssx 

r c 


R£(X C - xf = 


RC 


SSI, 


CS(I r - v 2 = — 


f 

ks(si„) - (s 


SSI 


for total (16.4) 
for columns (16.5) 
for rows (16.6) 


The sum of squares for the remainder can be obtained by subtracting the 
sums for between columns and for between rows from the total sum of 
squares. Formulas (16.4-16.6) may look forbidding at first, but actually 
the sums based on raw scores are easily secured by following a plan on the 
work sheet. Sum each row, and write the sums on the right-hand margin; 
sum each column, and write the sums along the bottom margin. Summing 
down the right-hand margin gives the total sum, and summing across 
the bottom margin should give the same total sum. Square all scores and 
sum to get the first sum in (16.4); square all the right-hand margin sums 
and then sum to get the first part of (16.6); square all the bottom margin 
sums and then sum to get the first part of (16.5). 

The student may do well to sit down at a calculator and perform these 
operations with the scores in Table 16.2, which contains visual acuity data 
on four (= R) individuals for three (= C) distances of the stimulus from 
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Table 16.2. Data for Yisnal acuity, 4 individuals, 3 distances 
(Monocular, vernier method, coded scores)* 

Distance (in Meters) 


Subjects 

5 

10 

15 

SAW 

c 

%r. 

1 

13 

29 

17 

59 

19.7 

2 

4 

9 

19 

32 

10.7 

3 

8 

30 

37 

75 

25.0 

4 ^ 

9 

27 

53 

89 

29.7 


34 

95 

126 

255 


X.C 

8.5 

23.7 

31.5 

21.2 = X 



r c 

= 255 

2 2 W 2 

r c 

= 7709 




= 18,051 


= 26,057 



* From Walker, E. L., Factors in vernier acuity and distance discrimination , Doctoral 
Dissertation, Stanford University, California, 1947. 


the eye. Casual examination of the table indicates that acuity measures are 
influenced by distance. Do the means for the three distances differ 
significantly ? 

The required sums are also included in the table. Substituting these in 
the foregoing formulas gives: 

A [12(7709) — (255) 2 ] = 2290.25 for the total sum of squares 
i~ 2 [3(26,057) — (255) 2 ] = 1095.50 for between-columns sum of squares 
12 [4(18,051) — (255) 2 ] = 598.25 for between-rows sum of squares 

Subtracting the sum of the last two from the total gives 596.50 as the 
remainder sum of squares. 


Table 16.3. Variance table for data of Table 16.2 


Source 

Sum of 
Squares 

df 

Variance 

Estimate 

Distance 

1095.50 

2 

547.75 

Subjects 

598.25 

3 

199.42 

Remainder 

596.50 

6 

99.42 

Total 

2290.25 

11 



These results are assembled in Table 16.3 along with the dfs and the 
variance estimates. For the influence of distance we have F = 547.75/99.42 
= 5.51, which for n Y = 2 and n 2 = 6 is significant at slightly better than the 
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p = 05 level (additional data in Walker’s dissertation leave no doubt- 
distance does have an effect). This is a situation in which experimental y 
induced differences are so large that they can be demonstrated with only 

four cases. 

double classification with more than one 

SCORE PER CELL 

Suppose that we have m scores in_ each celljff schematic Table 161. 
This would lead to a mean for each cell, and about each such mca c 
would have the variation of m scores. The mean for the rth row would be 
the mean of all mC scores in the row, or the mean of the Ccell means o 
row • the mean of the cth column would be the mean of the mR scores in 
the column, or the mean of the cell means in the column; m the remainder 

term, previously defined as (T rc - X r - X. e + X), we would re P la ^ f” 
by i The total sum of squares for all mRC scores would include a 
between-column, a between-row, and a remainder component, plus an 
additional part which would involve the variation mthm ^cells about the 
cell means. A convenient label for this new part would be W ’ 

in which it is understood that there are m such deviations m each ce . 
more precise notation would be - X ro ) 2 , in which is the 

ith score in the cell involving the rth row and the cth column. 

Table 16.4. Variance schema for double classification with m scores per cell 

Variance 

Source Sum of Squares df Estimate 


j Columns 
Interaction 
? Within cells 


mCS(A,.. - Xf 

r 

mRZOt.c-X ) 2 


' S E (A„ - X r . - X. e + X) 2 (R ~ D(C U 


SS(T« - l™) 2 

r c 

SS(T rt -T) 2 


mRC — RC 
mRC - 1 


The variance table would take on the form indicated in Table 16.4, m 
which the term “remainder” has been replaced by interaction, Note 
that the first two sums of squares are simply m times the corresponding 
sums for one score per cell, and that the dfs for these sums and for the one 
corresponding to the remainder sum are not changed. The £ for the 
within cells sum depends on the fact that there are m 1 degree 5 of 
freedom in each of the RC cells, which gives RCim - 1) - mRC 
as the df. We now have four estimates, s 2 r , s 2 c , s 2 m of variance. 
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This simple modification of the setup for the analysis of variance leads 
wo efimte advantages. We can increase the precision or dependability 
of our results by basing the analysis on more scores or cases, a^dwe cS 
test ‘he Possible significance of the interaction component. Before we 
cuss the first advantage, it is necessary that we consider the question 
possible interaction, the exposition of which is facilitated by an example 
which will also serve to illustrate the required computations" ? ’ 

A fr rrK a 2 tl0na ' f0 : mUlaS are eXt£nSi0nS of Previously used formulas. 

• yy 2 1S ca cu ated f or each cell. Summing the RC 2 A 2 values 
gives S2A as the sum of all the mRC squared scores. Summing the 2 A 
values in each row gives 2 A„, and summing the 2A values inTach Column 

gives 2 A„. These become sums along the margins, which marginal values 
sum down, and across, to the total sum of the mRC scores, 22 A The 

formulas S are eS ““ wiU be S y mbolized as 2*”' The 


Total sum of squares = 
Between-rows squares = 
Between-columns squares = 


mRC 

1 

mRC 

\ 


[mRC22A s „ - (22A„) 2 ] (16.7) 


mRC L e 


^2 


C2 


(?*••/- 

(?q- 


(22A ) 




(16.8) 


(16.9) 


Within-cells squares = 1 |>22A 2 , - 2(2A„) 2 ] (16.10) 

The interaction sum of squares is obtained as the remainder when the 
squares^ ° f the kSt three are subtracteb from the total sum of 

seslfons 6 a!^ ZT*™ T* ^ 8 W “ h ‘ W ° Variations as to practice 

sessions and two variations as to rest interval between trials For each 

3Sn n atbvZ'T ‘jTir tW6nty f= m) CaS6S - The scores 

recorded in a 2 by 2 or 4-cell table. Table 16.6 is a work-sheet lavout in 

foi'SlHJd for d th d e SUmS ° f S Th 68 ’ SUmS ° f SqUared SC ° reS ’ and ™ eans ’ 

the to al moun of A IT®'" 8 ' c right COrner Contains values f°r 

tll £1 foZi^g g " CaS6S - ° f SqUar6S ^ 0fdeviati °-) we 

Total: *[80(7835) - (735) 2 ] = 1082 1875 

Rows: *-[2(436 2 + 299 2 ) - (735) 2 ] = 234 6125 

Columns: *-[2(341 2 + 394 2 ) - (735) 2 ] = 35 1125 

Within cells: *[20(7835) - (217- + 219- + 124- + I75 2 )] = 782 4500 

Interaction: 1082.1875 - (234.6125 + 35.1125 + 782.4500) = 30.0125 
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Table 16.5. Coded learning scores (sum of scores on 29th and 
30th trials) for Koerth pursuit rotor* 


Rest 

Interval 


3 minutes 


1 minute 


Practice Sessions 


5(M T W Th F) 

9 14 6 10 

10 15 10 11 
14 17 10 11 
10 7 8 15 

12 8 14 6 


2 6 19 

5 9 2 11 

14 1 1 8 

14 4 11 5 

6 8 2 5 


3(M W F) 


8 10 11 14 

9 7 9 10 

9 12 13 14 

12 13 7 17 

9 12 8 15 

11 129 7 

9 6 11 9 

6 8 11 12 
9 7 4 10 

13 6 7 8 


* Data from Renshaw, M. J., The effects of varied arrangements of practice and rest 
on proficiency in the acquisition of a motor skill. Unpublished Doctor’s Dissertahon, 
tT nivf'rsitv. California. 1947. 


Table 16.6. Sums and means for data of Table 16.5 


Rest 

Interval 

Practice Session 

Totals 

5(M T W Th F) 

3(M W F) 

3 minutes 

£X n =217 
£X 2 n = 2543 

Jf n = 10.8500 

SX 12 =219 
£X 2 la = 2547 
jf 12 = 10.9500 

£X 1C = 436 

2X 2 lc = 5090 

X v = 10.9000 

1 minute 

SI 21 = 124 

EZ 2 21 = 1102 

X 21 = 6.2000 

2*22 = 175 

EX 2 22 = 1643 

X 22 = 8.7500 

SJf 2c = 299 
£X 2 2c = 2745 

X 2 . = 7.4750 

Totals 

£JY rl =341 

EZ 2 rl = 3645 

X. x = 8.5250 

SX r2 = 394 
£T 2 , 2 = 4190 

X. 2 = 9.8500 

££JY rc =735 
SSX 2 rc = 7835 

X = 9.1875 


The interaction sum of squares can also be calculated by direct sub¬ 
stitution into the definition formula of Table 16.4, which will involve RC 
quantities to be squared, summed, and multiplied by m. We have 
(10.85 - 10.90 - 8.525 + 9.1875) 2 = (.6125) 2 
(10 95 - 10.90 - 9.85 + 9.1875) 2 = (-.6125) 2 
(6 .20 - 7.475 - 8.525 + 9.1875) 2 = (-.6125) 2 
(8.75 - 7.475 - 9.85 + 9.1875) 2 = (.6125) 2 
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which when added and multiplied by 20 lead to 30.0125, or the value 
obtained by subtraction. 

Any reader who is surprised that the above four values involved in 
computing the interaction sum of squares directly are numerically equal 
should ponder the fact that for the given situation the df for the interaction 
term is (2 - 1)(2 - 1) or 1. 

Actually, the easiest way to compute the interaction sum of squares for a 
2 by 2 table is to work with the four cell sums of scores. The formula is 

+ s* 22 - SX 12 - S* 21 ) 2 

For this problem we have 

A(217 + 175 - 219 - 124) 2 = ^(49)2 = 30.0125 


Table 16.7. Analysis of variance for pursuit learning 


Source Sum of Squares df Variance Estimate 


Rest interval (rows) 

234.6125 

1 

234.6125 

Sessions (columns) 

35.1125 

1 

35.1125 

Interaction 

30.0125 

1 

30.0125 

Individual differences (within cells) 

782.4500 

76 

10.2954 

Total 

1082.1875 

79 



The sums of squares and resulting variance estimates are brought 
together in Table 16.7. We have four variance estimates which for the given 
situation are all estimates of the same population variance under the null 
hypothesis conditions: no row effect, no column effect, and no interaction. 
It is appropriate for this table to use s 2 w as the denominator of Fto test the 
row, the column, and the interaction effects. We have for interaction, 
F rc = 30.0125/10.2954 = 2.92, which falls short of the F of about 4.0 
required for significance at the .05 level. This indicates that the apparent 
failure of the four cell means to be consistent, in either direction, with the 
marginal means (or with each other) is attributable to chance fluctuations. 
For this particular problem the chance fluctuation is the sampling of 
individuals (plus a relatively small component having to do with errors of 
measurement). 

Next consider the effect on pursuit learning of varying the rest interval 
and varying the sessions. For sessions we have F c = 35.1125/10.2954 = 
3.41, which is not large enough to lead us to reject the null hypothesis; 
but since nonrejection of the null hypothesis does not prove the hypothesis, 
we can conclude only that the effect, if it exists, is not large enough to be 
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demonstrated by the number of cases used The 
Interval effect is highly significant as judged by F r = 234.6125/1U.25W 
2 S whichis double the F needed for the .001 level of significance. 
Now the fact that the interaction is not significant V st ™ ts us ® 

that the rest-interval effect is similar for five sessions and for three sessions 
per week If the interaction had been significant, we would need to qualify 
our conclusion about the effect of the rest interval. 

ILLUSTRATIONS OF INTERACTION 

Reference to actual examples of statistically significant interaction may 
help clarify its meaning. For this purpose we shall again use some data 
on visual acuity from the experiment by Walker.* For visual acuity (low 
score, better acuity) by two methods of measurement (depth andtermer) 
with binocular and monocular vision, we have means as given in Tabic 6.8. 

Table 16.8. Visual acuity: interaction of type of measurement with eyes 

Depth Vernier Total 


Binocular .08 1 -07 

Monocular .24 1.50 

Total .16 l- 28 


.57 

.87 

.72 


The marginal means are markedly different, and it is readily seen that the 
cell means (each based on 108 determinations) are not consistent with t 
marSnal values. The difference, .24 - .08, is not of the same order a 
the difference 1 50 - 1.07; or stated in another way, the two diffeiences 
differ ffom each other. In other words, the amount of difference between 
binocular and monocular acuity depends upon the typeof g 

One variable investigated in the experiment was *it Ts 
stimulus from the subject. Since distance is an ordered variable, it is 
possible to picture the interaction by making a graph, with acuity as 
anS —nee »l.n 8 .1* • »i. Fig. 16.1 
of acuity (average of the two types of measures) and the three djstanc 
used Note the difference between the two curves-the significant inter- 
“ton and —nee actually mean. that tlictwocutve. 

This lack of parallel behavior of curves is more striking in Fig. 16. , 
must ates he interaction of measures with distance, for binocular and 
“Tar cabined. In this study there was also a sigmficam— 
for the subjects by distance interaction, from which we conclude that 

* Walker E. L„ Factors in vernier acuity and distance discrimination. Unpublished 
Doctor's Dissertation, Stanford University, California, 1947. 
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Fig. 16.1. Simple interaction: 
eyes by distance. 



Fig. 16.2. Simple interaction: 
measures by distance. 


(s?Fin£i. etween acuity and distance varies from person t0 person 

a™ 6 " a A 1S ° I i 7 eS f ti f ,ed the effect stimulus rod width and size of 

f abscissa! for A ° f ^ r6SUltS f ° r aCmty ( ordinate ) against rod width 
(abscissa) for three apertures (A large, B medium, C small) is given in 

thfai™ as an oth er possible example of interaction except that this time 

This beW ti e aC f" T SHght 35 n0 * t0 P ° SSeSS Statist ^ al si S»ifioance. 
of aner ?,re T’ ““ * Mid * hat the effeCt ° f r ° d width «independent 
bLed on V M V - Ce V fi 6rSa) - C ° ntraSt this with the Possible conclusion, 
th • >ghly!significant F, that distance affects acuity. When we note 

L m „m a hoTd n at^^ e n t f dep r d in Fig - 16 ' 2 ’ we see that such a conclusion 

always calls for a & rfi ** d6pth measure ' Thus > significant interaction 
always calls for a qualification, sometimes drastic, regarding a main effect. 



Fig. 16.3. Simple interaction: dis¬ 
tance by subjects. 


Fig. 16.4. Nonsignificant inter¬ 
action: aperture by stimulus 
rod width. 
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persons to be affected similarly by the experimental condit.ons. 

CHOICE OF ERROR TERM IN TWO-WAY 

CLASSIFICATION 

stand forjudges (each of whom dtiributed 

with re .p «.»£"** 

*vri^:r::rir= 

„l, „ Jim. of the classifications ““ 

«" ”in - »—'“"S’”1 

-xr «rir r«^^r^,u ,» t , te 

typical in that one basis of classification is individuals. 
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«; f ut;td7!£ 2i°' rr* • he »hkc 

variance estimates plus a with n-ce i ° and 6ach leads t0 *ree 

one score per cell. It should be nnferl rtf * 6 CaSe We Eave more than 
for two kinds of rept^catio ^^° ted ^ ^ Wlthin " ceIls scores can stand 
having carried out the experiment wifh * have re phcation m the sense of 
(but with different persons from 

have a replication of measures nn th* * * lable 16 * 5 ’ or we might 

Table 16.2 we could Ive Tml ‘ PerS ° n ” perSons ' Thus ^ 

‘condTtionsT (We are not here cnnr per P erson under e ach o f the C 

mathelS^ ^mula for the possible 

from regardin^he mea"°‘s nS h S ° lmp °? ant 35 ,he deductio ^ ‘here- 
(pp. 256-64) i attZted to <1 > S6Ver u Varia " Ce 6StimateS - Ear,ier 
estimate, s\, Under nonnull conditions 111 PerhapsThe^ f 
back and review the argument that r a * " P the student should turn 

A as either ® ‘ ^ Ied *° specif y in S ‘he expected value of 

or as + 

S^.ZS3 Sst£ write out ^ e * pected vaI - as - p - h »y 

The general model for two-way classification may be written as 

in which the deviad^scLlrL^t + + *"* 061 » 

of in terms of a row contribution po P ulatlon mean is thought 

interactive effect, a a • and a normal’ a- , C °[ Umn cont ribution, an 

T he subscript k indicates that we random (error) P art > 

with k taking on values 1 1 K 71 e P Ilcatl °>b ™ scores per cell, 

independent of the scores in all mlr cl Th ” ° eU 

expressed in deviation form i P „ ' I” ” and *>•“« are al l 

2a=0 vy _ n A ’ P SS6SS the P ro P er ty that S a,. = 0, 

t—d a) ^ rKi , e Zmon ! “ ,ed T ‘T d of ' <or 

The I ? .... ** for , ? 

factor) model, we repfoce l S by 1 !f!Thus Ca " ed fil:ed effeC ‘ S ’ sometimes fixed 

r w (Xrck - f*) = A r + A e + ArA e + e„ h (i6 m 

or the random (sometimes called components of variance) moLl we 
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replace a by a , thus 

(X„ k - fi) = a r + a„ + a,a c + e rck (16.13) 

and the mixed model can be written (with columns standing for fixed 
constants) as 

(X rch - fi) = a r + A c + a r A c + <W (16.14) 

The a„ a c , a r a c , and a r A,, are all assumed to be random variates from 
normally distributed populations of effects having variances G c , CT rc , 
and a\ c . Note that the lower-case subscripts to a a 2 refer to random 
factors whereas the upper-case subscript refers to a fixed factor. (No such 
distinction is needed for subscripts to s 2 nor to X) For the fixed values 
A r , A c , and A r A c no assumption as to distribution of effects is required. 
Indeed, it is difficult to imagine a distribution of, say A c when C = 2. The 
“population” of effects consists of just two values, A x and A 2 , which 
symbols stand for (p. x - p) and (/c 2 - /*)• Two values, or for that matter 
the usual small number of fixed effects, cannot very well be described as to 
distribution, hence the differences among them or their variation about 
an over-all ^ cannot aptly be described in terms of a a 2 . Consequently, 
in the sequel the variation among them will be specified in terms of 2 A 2 C . 

Likewise, for the A r and the A r A c we will have S A 2 r and S S (A r A c ) 2 , 
respectively. 

When the m scores per cell represent measurement replication, s 2 w will 
be taken as an estimate of a 2 e ; when the m scores per cell involve m 
individuals (measured once), s 2 w will be regarded as an estimate of indi¬ 
vidual difference variance, designated a L. It is to be understood that a 2 i has 
two components: true score variance and error of measurement variance. 

We are now ready to examine the various possible situations involving 
two-way classification in order to point out just what is being estimated by 
s 2 r , s 2 c , s 2 rc , and s 2 w . Once this is done, we will be in a position to choose 
an appropriate variance estimate, if such is available, as the denominator, 
or error, term for F. The question of variance homogeneity will be dis¬ 
cussed after a consideration of eight setups (cases) involving two-way 
classification (p. 315). 

Case I. Fixed constants model, with m scores (m persons) per cell, a 
total of mRC individuals : 


S 2 r —> (7% + 
S 2 C -* (T 2 , + 

S*ro-»O t i + 


mC 
R - 1 
mR 
C - 1 



r 


sa*. 


m 

(R - 1)(C - lj 


ZZ(A r A c f 
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The general principle in forming an F ratio is to choose two estimates 
which differ (in their expected values) by one term only, the term involving 
the effect being tested. Accordingly, s 2 w is the correct denominator for 
F r , F cy and F re , for testing row, column, and interaction effects, respectively. 
Note that interaction, if present, has nothing whatever to do with the main 
(row or column) effects. This is true because the interaction is a fixed, 
not a random, effect. If the interaction is significant, we must be on guard 
in drawing conclusions about the main effects—qualifications will be 
needed, as we learned in our discussion (pp. 307-09) of the meaning of 
interaction. 

Case II. Fixed constants model, RC individuals, one per cell, with each 
measured m times: the expectations for the first three estimates will be 
precisely the same as in Case I, but now -> a 2 e . We see immediately 
that this design leads to difficulties. The resulting s 2 w estimate is useless; 
if we did use s 2 w as the denominator for testing, say, s 2 r , a significant F 
would be meaningless because we would not know whether its significance 
was attributable to a real row effect or to real individual differences or to a 
combination of the two. 

Case III. Fixed constants model, only one person measured m times under 
each of the RC conditions: if we replace a 2 i by a 2 e in the set of expected 
values for Case I we will have indicated what each s 2 estimates. As for 
Case I, the appropriate error term for all three Fs is s 2 w , but any conclusion 
we draw from a significant F must be carefully scrutinized for meaning. 
It can mean only that the effect holds for the one person used in the experi¬ 
ment, with no assurance whatsoever that a repetition of the experiment 
with another person—either in the same or in a different laboratory—will 
lead to a confirmation of the results. In other words, no generalization is 
possible except the trivial one that the effect holds for a particular indi¬ 
vidual, a useful generalization if the scientific horizon is limited to one 
person. 

These three cases exhaust the possible situations for two-way analysis of 
variance involving the fixed constants model. If it has occurred to the 
reader that each of m cases might be measured under all the RC conditions, 
he should be apprised that this would involve three-way classification, to * 
be discussed later. The important thing to have noted is that clear-cut 
results, permitting generalizations to a population of individuals, are 
possible only by the setup of Case I. We have listed the other two cases 
because it may be helpful to know what not to do. However, we must 
point out a possible exception to a sweeping dismissal of Case III: there 
are some areas (sensory-perceptual) in experimental psychology for which 
experimentally produced effects are so large relative to individual differ¬ 
ences that we can be reasonably sure that similar significant results will 
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hold for other persons; sure that is, provided some knowledge of the 
extent of individual differences is available. Rarely will the effects be 
of the same order of magnitude for two persons—individual by conditions 
interaction is the rule rather than the exception. 

Case IV. Random model, rows stand for R individuals and columns 
stand for C judges with m (ordinarily m will not exceed 2) ratings by each 
judge on each individual. The ratings, which must be directed toward 
some trait and involve at least a 10-point scale, might be based on observed, 
or on a transcribed record of, behavior of the R individuals. (The judges 
might find it difficult to rule out memory when making two or more ratings 
for each individual.) Instead of C judges making ratings we might have C 
examiners or testers, each testing the R individuals twice on, say, the 
Rorschach. We have a sample of individuals and a sample of judges (or 
examiners). The expected values of the variance estimates are; 


s 2 r -* a 2 e + mo 2 rc + mCa 2 r 
s 2 c -> a 2 e + ma 2 rc + mRa 2 c 

S 2 rc + ma \c 


It is obvious that s 2 w can be used as the error term for testing the 
interactive effect, but since s 2 w is nothing more than an estimate of error 
of measurement variance for the ratings, the conclusion from a significant 
F is that interaction holds for these particular R individuals and C judges— 
there is no assurance that repetition of the investigation with R other 
individuals and C other judges would lead to interaction. As for the main 
effect, it is obvious that s 2 rc becomes the appropriate (and only correct) 
term to use for F r and F c . A significant F r would mean a dependable 
differentiation of individuals over and above the variation due to measure¬ 
ment error and judge by individual interaction, and a significant F c would 
indicate real variation from judge to judge in a possible population of 
judges. 

Case V. Random model, same as Case IV except that m = 1. No 
estimate of a 2 e is available, but vv c would still be the error term, for F r 
and F c . 

Remark: Actually we are hard put to find good illustrations in psy¬ 
chology for the random model. Any student who attempts to find other 
illustrations should keep in mind that it must be possible to classify a 
score simultaneously in two different ways, each involving sampling. 

Case VI. Mixed model, rows stand for individuals (or matched persons), 
columns involve C fixed constants (fixed conditions having fixed effects), 


HUNT tlBRARY 

CARNBfE-MELLW UNIVERSITY 
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and measurement replication leading to m scores per cell: 

s 2 r —► a 2 e + mCa 2 r 

s 2 c ->■ a 2 e + ma\ c + mR £ A 2 C 
C — l c 

+ ma\ c 


The reader will need to recall that lower-case and upper-case letters as 
subscripts to a a 2 indicate random and fixed factors, respectively. The 
interaction term can be tested by F rc — s 2 r Js 2 w ; if F rc is significant, we can 
conclude that the differential responses (failure of the individuals to 
maintain the same rank order under the C conditions) are larger than 
expected on the basis of errors of measurement. Individual by conditions 
interactions are usually found to be significant. It will be recalled that in 
the mixed model the interaction term, a r A c , is regarded as a random 
variate, and as such it becomes a source of random variation which, if real, 
will affect the between-columns term. We see from the foregoing that s 2 rc 
is the proper error term for testing s 2 c . To use s 2 w for this purpose is 
simply not defensible; if, for example, s 2 Js 2 w is significant, it might be so 
because of real column differences or because of real interaction or 
because of a combination of the two. Ordinarily s 2 r in this situation is not 
tested for significance since it reflects individual differences which* are 
always real unless the measurements are completely unreliable. Indeed, it 
would be sad to carry out an experiment without first ascertaining that the 
measurements have reliability. 

Case VII. Mixed model, same as Case VI except that m = 1 (no 
measurement replication). This does not provide an s 2 w , which is non- 
essential anyway, but s 2 rc is again the proper error term for testing s 2 c . The 
setup in Table 16.2 falls under this case. In psychological research, Case 
VII is used quite frequently; it provides a significance test for the differ¬ 
ences among a series of C correlated means, correlated for reasons 
previously specified (p. 295-96). 

Case VIII. Mixed model, R rows stand for R individuals and columns 
stand for C forms of a test (the reliability of measurement setup discussed 
on pp. 296-97): 

s 2 r -> a 2 e + Ca 2 r 


o 




It will be recalled that s 2 rn which was earlier (p. 293) labeled a remainder 
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term was shown to depend solely on errors of measurement under the 
assumptions usually made in connection with test reliability. We see now 
that these assumptions involve the a priori assumption of no interaction, 
an assumption which implies, among other things, that possible practice 
effects are not different from person to person. Note that in case inter¬ 
action is operating, ^ will involve an interaction component (as m Case 
VII)- hence s 2 is the appropriate error term, regardless of whether there 
is or’is not interaction, for F c as a test of the difference between the C form 
means or over-all practice effects or both (we would not know which). But 
a test of the significance of s 2 ,. requires the assumption of no interaction. 

Remark about measurement replication: We have seen that having s a 
as an estimate of o 2 e does not provide us with a useful error term (for F) in 
the testing of hypotheses about main effects (and sometimes about inter¬ 
action) under any of the three mathematical models. This illustrates a 
general principle: when an estimate of error of measurement variance is 
used as the denominator of F, no generalization to a population of persons 
is possible, and hence no generalization of import to science. This raises 
the question as to whether measurement replication is worth while. The 
answer is yes, particularly when it is known that a single measurement is 
not very reliable. By replicating measurement we will obtain more reliable 
scores in the form of the average of m values; hence one source of vari¬ 
ability in the data will be reduced. The student who has not noticed that 
the analyses involving measurement replication are, in essence, dealing 
with average scores for individuals should ponder further. _ 

Homogeneity of variance assumption. For Cases I and (I it is 
assumed that individual difference variance is the same from cell to cell. 
For Cases III through VIII it is assumed that error of measurement 
variance is homogeneous from cell to cell. The assumption is testab e 
(say, by Bartlett’s test, p. 249) only for Cases I, III, IV, and Vf. ^ 

An additional assumption, seldom mentioned in textbooks of applied 

- - ’ ^- 1 cl IS 


statistics, is required for those cases where, for C greater than 2 
used as the error term. This is the assumption of homogeneity of inter¬ 
action variances, the meaning of which may not be quickly obvious to the 
reader. Let us consider the mixed model, with rows standing for indi- 
viduals and with columns for levels on some factor. The interaction sum 
of squares involves __ __ 

{X„ - X T . - X, + X) 

which, it will be recalled from p. 291, was a simplification of the remainder 

(X re - X) - {(X r . -X) + (V, - V)] 
from which we see that every deviation being squared and summed to get 
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the interaction sum of squares is one in which the deviation (X. - X) is 
wice adjusted, once for the row effect and once for the column effect 
Suppose that the adjustment has been made for the column effect and that 
we examine what is left, which is (X - Ft (y n “ ectandth f 
(^x — x ) A ~ ~ or simply 

Thus, after adjustment for column effect, the interaction sum of squares 
is represented by 2 2 (X rc - *,.)«, which in turn can be written as 

2(^i- X r f + X(X r2 - X r .?+--. 

+ 2 {X„ - X r f + • • • + 2 ( X rC - X r f 

each term of which is based on R degrees of freedom but, as we already 

2* = ic demef ‘JT T°* t 1 * 8 " 1 °" ( * ~ I)(C ~ 1} rather tha " 
interaction 17 freedora - The important thing to note is that the 
interaction sum of squares is made up of C components, every one of 

which when divided by R provides a variance estimate. In effect when we 
combine these C sums of squares as a basis for estimating the interaction 
variance we are averaging C separate estimates (with due allowance for 

tta win* P ota 

The ramifications of this interaction business might be better understood 
y pursuing further the foregoing breakdown into C parts. Consider the 

contribution of the first column, 2 (X ~ X ) 2 The Y • 

— _ A r.J ■ a ne x rl are variates with 

meanX and the X r are also variates with mean X, hence the differences 

rl Ar > — ~ x rJ m deviation units. Then 


2 (x rl 

r 


5V.) 2 = 2.r 2 rl 


+ 2 i 2 r . - 2 2 x r ,s 


+ RS% 




firsTcolunm^and S* ^ UnbiaSed eStimate > ° f the scores in *e 

and r Ts the c ^i ^. the u Vanance of the distribution of the row means, 

„„ A 1 *' ? th “T atl0n between these two variates. Similarly for the 

second column and the cth column we have J 


RS*. 


+ - 2 Rr XnIr S^ Si 


RS\ r + RS* - 2 Rr , X 


% rc X r . 


s* 


and soon to the Cth column. 

vaSne?f T *r ° Ut tW ° P ° Ssible sources for the heterogeneity of 
variance for the subparts entering into the row by column interaction The 

C components can differ either because the variances in the several columns 
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differ and/or because the degree of correlation between the row means and 
the column scores varies from column to column. When the latter varia¬ 
tion occurs, it implies that the C(C - l)/2 intercorrelations among the 
columns are heterogeneous, a heterogeneity that can readily be a by¬ 
product of the C experimental conditions or the C levels. Presumably, 
violation of the assumption of homogeneity of variances for the com¬ 
ponents of interaction will have the usual small effect on the F test; 
actually, there is some evidence that such is the case. 

Incidentally, since the foregoing development permits us to write 


XRS\ rc + CRS\ t 


2 X R r x rc 


as an expression for the interaction sum of squares after adjustment for 
differences in column means, we have the basis for additional insight into 
the meaning of interaction for the mixed two-way model. Suppose, for 
argument’s sake, that the variance in each column equals the variance of 
the distribution of row means and that there is perfect correlation between 
the scores in each column and the row means. Under these conditions the 
interaction sum of squares is seen to be zero. Next note that as soon as 
these correlations cease to be perfect, the interaction sum of squares ceases 
to be zero. The lower the correlations, the greater the interaction. Now 
since the correlation of the scores in, say, the first column with the row 
means will equal the correlation with the row sums (because the means 
are merely the sums divided by the constant, C), we can write this r, in 
simpler notation, as 

' Lx 1 (x 1 + x 2 + • • • + x c + ■ • • + x c ) 

^ xix r n o o 

um 

the numerator of which tells us that this correlation is a function of the 
extent to which the scores in the first column correlate with the scoies 
in the other columns; similarly, for the correlation of any column against 
the marginal means. Thus the interaction is a function of the intercorrela¬ 
tions among the columns—the lower these are, the greater the interaction. 
When it is recalled that errors of measurement tend to lower correlations, 
it is readily seen that the computed interaction depends in part on measure¬ 
ment errors, as was specified in the expectations given under Case VI. 

The foregoing argument holds, of course, when the variances within 
the columns are unequal. As an exercise the student might consider the 
situation where the variance in the first column is, say, 4 while that for all 
other columns is 1 and where the correlations are all unity, and thereby 
demonstrate that any differences in column variances also contribute to the 
interaction sum of squares. 
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THREE-WAY CLASSIFICATION 

Suppose that we wish to arrange an investigation so as to let one set of 
data serve to determine whether the variation of a dependent variable is 
due to or associated with variation on three independent variables. Again, 
the term independent variable is being used in its broad sense. It might be 
a “real” variable like illumination, temperature, amount of food, length 
of rest interval; or it might be a variable having to do with qualitative 
differences, such as kind of food, type of motivation or incentive, various 
psychological sets. It makes no difference whether the variables are 
manipulatable in the laboratory, as would be true of all those mentioned, 
or whether the desired variation is secured by appropriate choice of cases! 

It is necessary that we be able to assign individuals or scores to each 
combination of groupings made possible by whatever classifications we 
have on the three independent variables. Let us suppose that there are C 
categories on one variable, R on another, and B on a third. For purposes 
of exposition and as a systematic way of arranging the data, let the C 
categories define C columns, the R categories R rows, and the B categories 
B blocks. Let X rbc represent the score in the rth row, 6th block, and cth 
column, and let us assume for the time being that we have only one score 
for each combination. Thus X 3Zi would be the only score in the third row, 
second block, and fourth column. The scores may be arranged in some 
such systematic order as that in Table 16.9, which should be studied 
carefully by the reader. 

Note in particular how the various sums are specified and their location 
in the table. The first two subscripts in 2 X Ue indicate that this sum has to 
do with scores in the first row and first block, and that in the summing 
process c takes on values running from 1 to C. The general expression for 
all such sums is 2 X rbc . The symbol 2 X m stands for the sum of scores in 
the first column and first block; r takes on values of 1 to R. The corre¬ 
sponding general symbol is 2 X rbc . In next to the bottom section of the 
table will be found 2 as the sum for all the cases in row 1 and column 1, 

the summing being through blocks; i.e., b takes on values from 1 to B 
The general expression for such sums is 2 X rU . The sum of all the scores 

in the first block is symbolized as 2 2 X rlc , and in the 6th block as 2 2 X r6e . 

For the sum of all the scores in the first column, irrespective of row and 
block, we have 2 2 X rbl , and the general expression is 2 2 X rbc . The symbol 

2 2 X 1H stands for the sum of all scores in the first row, and 2 2 X rbc is the 
corresponding general expression. Note also how the “dot” notation is 
used to specify the several means. The subscript which has been replaced 
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Table 16.9. Score and sum schema for three-way classification 





Column 


Sum 

Mean 



1 

c 

c 


Row 

1 


*lle 

*11C 

F- 

Xu. 

Block 1 

r 

*,n 

*rlc 

*rlO 


*rl. 


R 

*2211 

*5lc 

*51C 


*51- 


Sum 

?X m 


2 * rl o 

r 

v v F 
^ ^ -^rlc 
r c 

*.l. 


Mean 

*11 

*.lc 

*.1C 

X*. 

Mean block 1 


1 

*m 

*16 c 

*160 

F* 

*16- 

Block b 

r 

*rbl 

*r6c 

*r6C 

2 * rbc 

*r6- 


R 

*561 

*56 c 

*560 

2 *56c 

Xj». 


Sum 

2 x rbl 

r 


2 *,60 


*6. 


Mean 

*•61 

T. 6 c 

*.60 

*•6. 

Mean block 6 


1 

*151 

*l5c 

*150 

2 *i5c 

*15. 

Block B 

r 

*r5l 

*r5c 

*rBO 

2 *r5c 

*r5. 


R 

*55l 

*5 Be 

*550 

2*55c 

c 

*55- 


Sum 

PrBl 

S*rBc 

2 *, 50 

T 

2 S * r 5o 

r c 

*5- 


Mean 

' *51 

7 *.Bc 

*. BO 

*•5- 

Mean block i? 

Sums 

1 





x... 

through 

r 


2 *r6c 


? 2 X,.,, 

v7 
^ « 

blocks 

R 

b 

^ *561 

S*56c 


0 c 

s 2 Xr &c 

6 c 

X B .. 


Sum 


s s * rbc 

2 2 X rbC 
}• 6 

2 2SX r(>c X... 

r b c 

Means for 1 

* 1.1 

*1.0 

*1.0 

*!•• 

Means for 

rows by 

r 

*r-l 

*r-o 

*r-0 

*r.. 

rows 

columns 

R 

*5-1 

*5.e 

*50 

*5-- 


Column means 


*..o 

*..o 

*... 

= * 
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Fig. 16.5. Geometric picture of three-way classification. 


by a dot indicates the direction of the addition required to obtain the sum 
for the given mean. Thus in X 24 the dot replaces r; this mean is based on 
R scores, with r running from 1 to R when we sum. The subscripts which 
are left denote that the mean is for scores in the second block and fourth 
column. The total number of means will be as follows: 

RB means of the form X rb . 

RC means of the form X r . c 
BC means of the form X. be 
R means of the form X r .. 

B means of the form X. b . 

C means of the form X.. c 
One mean of the form X... = total mean = X 

Perhaps a better appreciation of the meaning of all these means can be 
obtained by a study of Fig. 16.5, which pictures geometrically the situation 
for two blocks, three rows, and four columns. The individual scores can 
be thought of as in the cubicles of a 2 by 3 by 4 box. Summing through 
the box in the vertical direction leads to the 8 means on the top; summing 
m the forward-backward direction leads to the 12 means on the front 
surface; and summing through right-leftward leads to the 6 means on the 
side. Summing the means (or summing sums) across the front leads to the 
means placed along the vertical axis for the groups defined by the rows; 
summing the means (or sums) downward on the front leads to the means 
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placed along the right-left axis for the groups defined by the columns; 
summing down on the side leads to the means along the third axis for the 
groups defined by the blocks. To get any of these means it is, of course, 
assumed that the sum involved is divided by the proper number. 

Of primary interest is the question: Is the variation among the means 
along the edges, considered separately, larger than expected on the basis 
of chance? To answer this we need to break down the sum of squares of 
deviations from the total mean into appropriate components. The score 
X rbc in the cubicle defined by the rth row, 6th block, and cth column will 
vary more or less from W and three possible sources of variation for X rbc 
are obvious: the deviation of its row mean, its column mean, and its 
block mean from X. Now, if we recall the situation for double classifica¬ 
tion, it is fairly obvious that, when the score X rbc is considered as belonging 
in row r and column c, one source of variation becomes the remainder 
or interaction for rows and columns; considered next as also falling in 
row r and block 6, another source of variation is the possible interaction 
of rows and blocks; and then thought of as belonging to column c and 
block b , the score also involves the interaction of columns and blocks. 

When the sums of squares for these six components are added, it will be 
discovered that they do not sum to the sum of squares for the total; i.e., 
subtracting these six sums from the total sum leaves a remainder. This 
residual is sometimes referred to as error, more frequently as a three-way 
interaction. This term involves rows, blocks, and columns. The reader, 
having in mind the idea that the simple row by column interaction has to do 
with the possible failure of cell entries to be consistent with the two sets of 
marginal means, must now try imagining that the RBC entries in the 
cubical cells of our box may not be entirely consistent with the three sets of 
means on the edges and with the three sets on the surface. We have seen 
that a statistical check on two-way interaction is not possible with only one 
entry per cell; similarly more than one score per cubicle is required for 
testing three-way interaction. 

Table 16.10 gives the essentials, in symbols, for the analysis of variance 
for the triple classification setup. In order to specify the interactions, we 
here adopt the abbreviation scheme generally used. Thus R X B, read 
R by B , indicates the row and block interaction, and R X B X C stands 
for the row by block by column or three-way interaction. In a given 
investigation, the rows, blocks, and columns refer to particular indepen¬ 
dent or classificatory variables. 

It will be noted in Table 16,10 that the df for the three-way interaction 
term is given as (R - 1 )(B - 1)(C - 1). The student may be helped in 
understanding the reasoning which leads to this df by referring again to 
Fig. 16.5. The surface means tend to restrict the deviation score values 
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Variance table for three-way classification into R rows 
B blocks and C columns 


Variance 

Estimate 


Withi " H ° W man y cubical ceIls can we before these restrictions 

opera e . e general rule-of-thumb procedure for determining the df for 
interaction sums of squares is to take the product of the df .s of the variables 
involved in the given interaction. This holds for two-way, three-way, and 
higher-order interactions. J y 


SPECIAL CASE WHERE THE ROWS STAND FOR 
PERSONS OR MATCHED INDIVIDUALS 

Suppose the purpose of a study is to ascertain whether variation on a 
dependent variable is influenced by or associated with variation on two 
independent variables. This, of course, involves the double classification 
idea previously discussed, but we are now in a position to accomplish by 
means of three-way classification, two closely related things which could 
not be done by the simpler two-way classification scheme. 

1. If transfer, practice, fatigue, etc., effects are such that it is permissible 
to make observations on an individual under each of the Recombinations 
o conditions we may increase the precision of an experiment by using 
only m individuals instead of mRC individuals, as in the illustration in¬ 
volving pursuit learning. Or we may make observations on mRC cases so 
as to have in each of the RC cells m scores which are based on m sets of 
matched individuals, thereby reducing error. 
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2. If we are dealing with a situation in which it is required that observa¬ 
tions be made on the same individual in each of the RC conditions, and if 
more than one case is used either to reduce errors or to provide a basis for 
generalizing to a population, it is necessary that we make statistical allow¬ 
ance for the fact that the RC observations on the m cases are nonindepen¬ 
dent, or correlated. This allowance was not possible by the two-way 
classification scheme, for which it was assumed that the m scores in one 
cell were independent of the observations in the other cells. 

It will be recalled that in the two-way classification setup, by letting one 
classification refer to R individuals or sets of matched cases, we were 
provided with an over-all test of significance for several correlated means 
for groups classified on a single independent variable. Triple classification 
permits a similar test of correlated means for groups involved in double 
classification. 

Since the assigning of the bases of classification to rows, blocks, and 
columns is arbitrary, we shall let the R rows stand for R individuals 
(or R matched persons), with the blocks and columns representing the 
independent variables to be investigated. 

COMPUTATIONAL ILLUSTRATION FOR THREE-WAY 
CLASSIFICATION 

The task of computing the required sums of squares (see Table 16.10) is 
tedious. The first step is to arrange the data in some such systematic order 
as that depicted in Table 16.9 and do the necessary adding to secure the 
various sums indicated in that table. The total sum of squares for all RBC 
cases is obtained as usual: sum all the scores, sum all the squared scores, 
and substitute in the general formula (l/RBC)[RBCEX 2 — (EX) 2 ], 

To secure the three between-groups and the three simple interaction sums 
of squares, we form three subtables involving sums taken in various direc¬ 
tions. For the first of these subtables we take row by column sums 
obtained by adding cell entries from block to block, i.e., through the B 
blocks. The next to the bottom section of Table 16.9 contains these row 
by column sums, which we reproduce here as Table 16.11^7. The reader 
will note that the values for Table 16.1 lb are the right-hand margin sums of 
Table 16.9 and that the values for Table 16.11c are found as the sums in 
Table 16.9 along the bottom of each block. 

With these auxiliary tables in mind, we can write the required compu¬ 
tational formulas. The simple interaction terms are secured by computing 
a subtotal sum of squares for each table and then subtracting therefrom the 
two appropriate “between” sums of squares. These subtotal sums of 
squares will not be the same as the total sum of squares obtained for 
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Table 16.11 a. Required sums for row by column analysis 



1 

C 

c 

Sum 

1 



^ X\be 


r 

2X m 


^ X rb (j 

S 2 X ric 

R 


y y 

f A Rbc 



Sum 


2 S X rb0 

r b 

ssx r60 

2 

rbc 

Table 16.116. Required sums for row by block analysis 


1 

b 

B 

Sum 

1 

SX 11C 




r 

SX rlc 


2 X rBc 


R 



? Xrbc 


Sum 

s s x ne 

2 S X* c 

r c 

W X rB. 

S 2 S 

rbc 


Table 16.11c. Required sums for block by column analysis 



1 

c 

c 

Sum 

1 


2X rlc 

r 


2 S X rlc 

r c 

b 

Fm 

F* 


2 S X, &c 

r c 

B 

7 

7 ^rJ9c 

7 ^rBC 


Sum 

2 2 



???**• 


double classification by formula (16.4) because we are now dealing with 
cell entries which are the sums of scores rather than single scores. Due 
allowance for this can be made by a slight change in formula (16.4). The 
amended formula, with notation appropriate for and specific to the three 
auxiliary tables, may be written as follows: 


Subtotal: row by column 


1 

RBC 


r c 



Subtotal: row by block 


1 


RB S S 2 X 


r b 


rbc 


S S S X rbc 

, r b c 


2 ' 



(16.15a) 


RBC 


(16.15 b) 
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(16.15c) 


From the right-hand margin of either Table 16.1 In or 16.1 16 \ye can 
compute the sum of squares for 

2" 


Between rows: 


1 


RBC 


R2 (sSXj - SSSlJ 

r \b c / \r b c I 


(16.15J) 


From the bottom of either Table 1(5.1 la of 16.11c we can obtain the sum of 

squares for 


Between columns: 


1 


RBC 


cs (ss 

L c \r b 1 \r b 


SI 


rbc 


(16.l5e) 


From the bottom of Table 16.116 or froip the right-hand margin of Table 
16.11c we can calculate the sum of squares for 


Between blocks: 


1 

RBC 


5S 


fssij 2 - sss4c 

\ r c ) \ r b c / 


(16.15/) 


Then from the above six sums of squares the simple interaction sums of 
squares may be secured by the following subtractions. 

Row by column interaction: (16.15a) — (16.15c/) •— (16.15c) (16.1 6a) 

Row by block interaction: (16.156) — (16.15 d) — (16.15/) (16.166) 

Block by column interaction: (16.15c) - (16.15c) - (16.15/) (16.16c) 

And finally, again by subtraction, we have the sum of squares for the row 
by column by block, or 

Three-way interaction: Total sum of squares minus (16.1 Sdef) 

minus (16.1 6abc). 


We will illustrate the procedure by using the data of Table 16.12, in 
which the blocks represent two levels of illumination, the columns three 
degrees of albedo, and the rows four individuals, and the scores are judged 
whiteness. Notice that each subject made judgments under all six of the 
combinations of conditions. The sums given in Table 16.12 become the 
entries for the auxiliary computational Tables 16.13a6c. The needed value 
of S S S T r&c is 898, and the sum of all the squared scores, X S S X 2 rbc , 

rbc 

is 44,394. From these figures we have 

jl[24(44 5 394) - (898) 2 ] = 10,793.83 = total sum of squares 
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TaWe 16.12 D a ta used in illustrating computations for three-way classification 
2 levels of illumination (blocks), 3 albedos (columns), and 4 observers (rows)* 


Albedo 


Illumination 

Observer 

.07 

.14 

.26 

Sum 

Mean 


1 

11 

24 

60 

95 

31.67 


2 

22 

26 

44 

92 

30.67 

1.20 

3 

16 

22 

55 

93 

31.00 

4 

20 

32 

82 

134 

44.67 


Sum 

69 

104 

241 

414 

34.50 


Mean 

17.25 

26.00 

60.25 

34.50 


1 

14 

24 

65 

103 

34.33 


2 

27 

36 

47 

110 

36.67 

2.00 

3 

18 

24 

62 

104 

34.67 

4 

24 

59 

84 

167 

55.67 


Sum 

83 

143 

258 

484 

40.33 


Mean 

20.75 

35.75 

64.50 

40.33 



1 

25 

48 

125 

198 

33.00 

Sums through 

2 

49 

62 

91 

202 

33.67 

3 

34 

46 

117 

197 

32.83 

blocks 

4 

44 

91 

166 

301 

50.17 


Sum 

152 

247 

499 

898 

37.42 

Means for 

1 

12.50 

24.00 

62.50 

33.00 


rows by 

2 

24.50 

31.00 

45.50 

33.67 


columns 

3 

17.00 

23.00 

58.50 

32.83 



4 

22.00 

45.50 

83.00 

50.17 



Column means 19.00 30.87 62.38 37.42 

* Data from R. E. Taubman, J. Exp. Psychol., 1945, 35, 235-241. 


The various “between” sums can readily be obtained by adding the 
squares of the appropriate marginal sums of auxiliary Tables 16.13 abc 
and substituting in formulas (\6A5def). 

For between blocks we need (414) 2 + (484) 2 = 405,652; 

For between columns we need (152) 2 + (247) 2 + (499) 2 = 333,1 14 * 

For between rows we need(198) 2 + (202) 2 + (197) 2 + (301) 2 = 209,418. 
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Table 16.13a. Required sums for block by column analysis 

Albedo 


Illumination 

.07 


.14 

.26 


Sum 

1.20 


69 


104 

241 


414 

2.00 


83 


143 

258 


484 

Sum 


152 


247 

499 


898 

Table 16.13b. 

Required sums for row by block analysis 




Individuals 




Illumination 

1 


2 

3 


4 

Sum 

1.20 

95 


92 

93 


134 

414 

2.00 

103 


110 

104 


167 

484 

Sum 

198 


202 

197 


301 

898 

Table 16.13c. 

Required sums for row by column analysis 




Albedo 




Individual 

.07 


.14 

.26 


Sum 

1 


25 


48 

125 


198 

2 


49 


62 

91 


202 

3 


34 


46 

117 


197 

4 


44 


91 

166 


301 

Sum 


152 


247 

499 


898 


Then we have 

A[2(405,652) — (898) 2 ] = 204.17 for between-blocks sum of squares 
AP(333,114) _ (898) 2 ] = 8039.08 for between-columns sum of squares 
A[4(209,418) — (898) 2 ] = 1302.83 for between-rows sum of squares 

In order to secure the subtotal sums of squares we add the squares of the 
cell entries in the auxiliary tables. For the block by column subtotal we 
have from Table 16.13a: 

( 69)2 + ( 8 3)2 + ( 10 4)2 + ( 143)2 + ( 24 i )2 + (258) 2 = 167,560 
Similarly for the row by block subtotal we have from Table 16.136: 

( 95)2 + ( 103)2 + . . . + (167) 2 = 105,508 
and for the row by column subtotal we have from Table 16.13c: 

(25) 2 + * • • + (44 ) 2 + * * * + (166) 2 = 87 814 
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These three sums can now be substituted into formulas {\6A5abc)\ 

A[6(167,560) — (898) 2 ] = 8289.83 = block by column subtotal sum of 

squares 

A[8(105,508) — (898) 2 ] = 1569.17 = row by block subtotal sum of 

squares 

A[12(87,814) — (898) 2 ] = 10,306.83 = row by column subtotal sum of 

squares 

Next we get the simple interaction sum of squares by the subtractions 
indicated in formulas (\6A6abc)\ 

8289.83 — 204.17 — 8039.08 = 46.58 = block by column 

interaction 

1569.17 — 204.17 — 1302.83 = 62.17 = row by block interaction 

10,306.83 — 8039.08 — 1302.83 = 964.92 = row by column 

interaction 

Then for the three-way interaction sum of squares we have 

10,793.83 - 204.17 - 8039.08 - 1302.83 

- 46.58 - 62.17 - 964.92 = 174.08 

The several sums of squares, their dfs, and the resulting variance 
estimates are brought together in Table 16.14. 


Table 16.14. Analysis of variance 

for judged whiteness by 4 

observers for 3 

degrees of albedo 

and 2 levels of illumination 



Sum of 


Variance 

Source 

Squares 

df 

Estimate 

Illumination 

204.17 

1 

204.17 

Albedo 

8,039.08 

2 

4,019.54 

Subjects (individual differences) 

1,302.83 

3 

434.28 

Interaction: I x A 

46.58 

2 

23.29 

Interaction: I x S 

62.17 

3 

20.72 

Interaction: A x S 

964.92 

6 

160.82 

Interaction: I x A x S 

174.08 

6 

29.01 

Total 

10,793.83 

23 



We are not yet ready to discuss the principles controlling the choice of 
the error term appropriate for the possible Fs. When the models have been 
presented, the student may check back to see whether we have used, in the 
next two paragraphs, the correct denominator for the F ratio. 
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First we use the three-way interaction as a basis for testing the sigm - 
cance of the simple interactions. Of chief interest in this example is the 
possible interaction between albedo and illumination, but since th 
Interaction variance is less than that for three-way interaction we know at 
once without computing F that the interaction is insignificant. The 
illumination by individual interaction is also insignificant. The int e ra ? tl ° 1 ' 
of albedo with individuals yields an F of 160.82/29.01 — • > w me , 

n = 6 and n 2 = 6, falls between the values of 4.28 and 8.47 for the -05 and 
01 levels respectively. This F of 5.54 is high enough to suggest that the 
form of the relationship between judged whiteness and albedo varie 

somewhat from person to person. . n c 

Now we turn to a test of the main effects. A test of the significance of 
row differences is a test of individual differences and is accordingly of lit le 
interest. For illumination we have F = 204.17/20.72 = 9.85, which fall 
near the 10.13 required for P = .05, and is therefore of areal 

difference due to illumination. For albedo we have F = 4019.54/160.82 

= 24.99, which is highly significant. 

Actually, the foregoing results are not to be regarded as conclusive 
The data which we have used to illustrate the computations are only a part 
of more complete data which involved additional degrees of albedo and 
other levels of illumination. Partly because of space limitations and paitly 
because it is easier to illustrate the computations when only a few rows, 
columns, and blocks are involved, we have ignored a part of the availab e 

d£l It should be kept in mind that this illustration is an example of the use of 
the three-way classification scheme as a method for making al owance for 
the use of correlated observations in a problem of double classification 
involving the influence of two variables on a third. In this special use o 
three-way classification, in which the rows correspond to individuals, the 
objective is identical with that in the earlier analysis of pursuit rotor 
learning (Table 16.7). The two situations are similar m that there are m 
(or R) scores in each cell; they are different in that the m scores m any one 
cell for the pursuit learning problem are independent of the m scores in 
other cells, whereas the R scores in each of the albedo-il uminatipn cells 
are correlated-each person contributes a score to each cell. Both schemes 
permit a check on the interaction effect of the two independent variables 
used to classify the observations. The use of BC observations on each 
of R cases (if feasible) will yield more precise information than obtainable 
by having scores for m individuals in each of the BC ce s. is ^s ana o 
gous to the well-known principle that experimentation in which individuals 
serve as their own controls tends to be more precise than that in which an 
independent control group is set up. 
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JuBI E c£e AY CLASSIFIC ATION WITH m CASES PER 

We have seen how the possible association of a dependent variable with 
three mdependent variables can be tested by a variant analysis mad on a 

£^c!T n t haSiS - If W ^ dther *° baSe results on more 
Son ° bSerVatl0nS ° r t0 test the significance of the three-way inter¬ 
action, H is necessary to have more than one score in each cubicle. This 
an e accomp ished either by assigning m individuals to each of the RBC 
combinations of conditions or by using just m individuals with each 
yielding an observation under all the RBC conditions or by using m sets of 

f C °" e individuaI of each set =«-gned to each ofZeRBC 

groups. Matching may not be feasible; neither may the securing of RBC 
observations on each of m individuals be feasible. At times however the 

SdunT U " [ '[ COI,S ' dCratori rcc l uire an observation on each individ- 
h C COndUl0nS ; Whether * individuals are so used by prefer- 
, . . b ^ necesslt y> w e will have m measurements in each of the RBC 
cubic,es but in testing the significance of the differences between fhe 
means of rows or of columns or of blocks we will be dealing with a situa¬ 
tion m which the means are correlated because they are bfsed upon the 
t“ up. To allow for this fact we would need a four-way classifii 

Let us next consider the case in which we have in each cubicle m scores 
w ich are mdependent of the m scores in other cubicles. The total number 
of scores will of course, be mRBC, and the breakdown of the toteUum 
of squares will include the components specified in Table 16 10 plus a 
w hm-cnbic es sum of squares. Since each cubicle defines a group the 

“wffhlff“ * T SqUareS d06S " ot differ from P r eviousl)fdiscussed 
within sums of squares. The formula in this case is 

±\mZYZX\, c -Z(Zx rbc f-] 

iLSf U : lderSt00d that the 2X2 te ™ contains mRBC squares and 
that the subtrachve term indicates that we first sum the m scores separately 

for each cubicle, then square each of these sums, and finally sum all th<£ 
RBC squared sums The df for this term will be mRBC - i?i?C because 
meanT ^ ^ ^ ^ deviatlons of mRB C scores about RBC different 

rieTMn^H^T^ 6111 S ^ eS P6r CUbide> the Six corn putational formulas 
16.IS) need only be modified by the use of 1 /mRBC instead of MRBC a 

the factor outside the brackets. It must be understood, however that the 

sums within the parentheses of formulas (16.15) will involve m times as many 
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scores as for the simpler situation with one case per cubicle. The com- 
nutation is again accomplished by auxiliary tables, the main cell entries of 
which will, of course, also involve sums with m times as many scores. If 
we think of the orderly arrangement of the original data, as exemplified in 
Table 16,9, it will be seen that each cell in the separate block designations 
will consist of m score entries; i.e., we will have m scores of the type 
X nl or X zu . A more precise notation would be to let X irbG stand for the 
score of the /th person in the rth row and cth column of the bt h block, with 
i taking on values of 1, 2, • • • m. 

Except for the use of 1/mRBC in place of l/RBC in formulas (16.15), 
the computation of the between and simple interaction sums of squares 
follows exactly the steps outlined for a single score per cubicle, The 
three-way interaction sum of squares is again obtained by subtraction, but 
now we must also deduct the within-cubicles sum of squares. Note that 
in the formula of Table 16.10 which defines the three-way interaction term 
we need to replace Y r&c by Y rbc , the mean of the m scores in the rth row and 
cth column of block b . 

CHOICE OF ERROR TERM IN THREE-WAY 
CLASSIFICATION 

The general mathematical model for the breakdown of a score in the 
three-way classification setup may be written as 

(X rbcl) — jU) = + a b + a c + cc r u b + a r a c + a b a c + a r a b a c + e rbck 

in which the subscripts, r, b , and c refer to rows, blocks and columns, and 
k takes on values 1 • • • m, there being m independent replications (either 
of measurement or of individuals) in each cell. The mean value of each 
term on the right of the equality sign is zero; that is, all values are expressed 
in deviation units. Note the manner in winch the interactive effects are 
designated—oc r oc & is to be read as row by block interaction. Using notation 
like that employed in specifying equations (16.12-16.14) from equation 
(16.11) for two-way classification, we may replace the alphas by /ts to 
represent fixed values (fixed constants model) and by as for classifications 
involving samplings (random model). The mixed model would, of course, 
contain one lower-case and two capital letters or two lower-case and one 

Rather than rewrite the model equation with Latin letters specifying the 
particular models, we can indicate the models by the following symbols. 

[A r A b A c ] for fixed constants model 
[a r a b a c ] for the random model 
[a r A b A c ] and [a r a b A c ] for mixed models 
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It is assumed that the a 


u b , a c , a r a b , a r a e , a b a c , 


c , a b A c , 


a r a b a c , <W4 0 > a r A b A c> and e r&cjfc are random variates from normally 
distributed populations of effects having the respective variances: a\, a 2 , 


o-.. o-- u 2 _ oA- a* „ u 2 ^, o^. (j 2 r? , c , or 2 rRO , and o' 2 , when 


j &c ’ " rB ’ u u T>C> u rbc> u rbC> °~rBC’ - - 0 , 

k ~ 1 ' ' ’ m represents measurement replication or a% when k = 1 • - m 
involves replication of individuals. There is seldom, if ever, an oppor¬ 
tunity to check on the normality of the several interactive effects—a fact 
which may be disturbing to the reader. No such assumptions are made 
regarding the effects A r , A b , A c , A r A b , A r A c> A b A c , and A r A b A c , which are 
associated with the fixed constants. Since all effects are expressed in terms 
of deviation units, the sum of each particular set of effects, such as a r or A 
or a r A b or A b A c , is zero; that is, e.g., E %a r A b = 0. 

In order to choose the appropriate variance estimate for the denominator 
of F for a given significance test, we again need to indicate just what each 
possible variance estimate (s 2 ) estimates under nonnull conditions. A 
summary statement will be given later regarding the assumption of homo¬ 
geneity of variance for the several cases involving three-way classification 
(p. 337). 

Case IX. Fixed constants model [A r A b A c ], with m different individuals 
m each of the RBC cubicles. This is a simple, straightforward case in which 


cr 


and all the other seven s 2 values are estimates of a 2 i plus a single 


(possible) effect, the one to be tested. Examples: 


and 




rb 


+ 


(R 


Tsrr-r,??^ 


Thus s\ is the proper error term for testing all three main effects, all three 
two-way interactions, and the three-way interaction. Generalizations are 
to the population^) from which the mRBC persons were drawn, but 
conclusions regarding main effects, or factors (the a,, and a c are often 
spoken of as factors), will need to be qualified in case a given factor is 
involved in a significant interaction. 

A subcase under Case IX in which m = 1 will not provide the needed s 2 w 
as the error term, hence is not a fruitful plan unless an estimate of 
is possible under the assumption that an interaction is zero, but such an 
assumption in psychological research is hazardous. 

Case X. Fixed constants model, one person per cubicle but each person 
measured m times. This leads to an s 2 w which is an estimate of o 2 e rather 
than the needed estimate of a\. Now it might be thought that this s 2 w 
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could be used to test s 2 r&c for the presence of three-way interaction, but 
note that since 


rbc 


+ 


m 


(R - 1)(B - 1)(C 


1 ) 


S S S (A r A b A c f 


and 5 2 w -> cr 2 ,, the division of 5 2 r6c by s\ leads to a noninterpretable 
F (if significant) because there is no way of knowing whether the signifi¬ 
cance is due to individual differences (remember that.o' 2 * contains an error 
of measurement part) or to three-way interaction. Stated differently, the 
s % rbc is an estimate in which error of measurement variance, true individual 
difference variance, and possible three-way interaction effects are all 
confounded , a term used to indicate that a given setup does not allow a 
disentangling of the sources of variation which enter into a particular 
estimate. 

Case XL Fixed constants model, with only one person supplying all 
scores, i.e., a score (or scores) under each of the RBC combinations of 
conditions. If we have m measures on the one person under each of the 
RBC conditions, s 2 w -* cx 2 „ and each of the other seven variance estimates 
has an expected value including o 2 e plus an effect. A significant F with s 2 w 
as the error term permits only the conclusion that repetition of the experi¬ 
ment on this same person would be expected to yield similar results—a 
“generalization” which has no generality, and hence is worthless. 

Case XII. Mixed model [a r A b A c ]. Typically, this will involve R 
individuals assigned to the rows with each measured at least once under 
the BC conditions. We have (with no measurement replication): 


s 2 r -> 


+ 

BCa\ 



+ 

C<Ab 

s 2 -* 

15 c 

o 

+ 

B<y\ c 

c 2 

* rb 

C7 2 , 

+ 

Co\ B 

s 2 _> 
° rc 


+ 

Ba\ 0 

*2 


+ 

°\bC 

, 2 

J rbc 


+ 

_2 

a rBC 


RC 


B - l 




RB 
C - 1 




R 


(B - 1 )(C - 1) & 


SS(Ve ) 2 


Scrutiny of the foregoing expected values indicates that s 2 rbc is appro¬ 
priate for testing the B x C interaction, that s 2 c should be tested against 
s* re and s\ against s 2 r5 . No test for s 2 r is possible, but this is not serious 
since it would only be a test of the significance of individual differences. 
Nor is there a test for s 2 rb and s 2 rc , the two interaction terms having to do 
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with individual differences in reaction to the defined experimental condi¬ 
tions. We would need an estimate of a 2 e for this purpose; ordinarily, 
such individual by condition interactions are real. 

Case XIII. Mixed model [a r a b A c ], with one score per cell. Researches 
calling for this model in psychology are not plentiful. Suppose R children 
are observed under C different social conditions by B observers, each of 
whom rates (on a 10-point scale) each child in each of the situations for a 
particular aspect of behavior, e.g., social participation. Primary interest 
would be in the effect of the conditions (the A c effects) with secondary 
interest in observer bias (the raters being regarded as a sample of ob¬ 
servers having a b “effects”) and possible interest in two-way interaction 
effects. For model [a r a b A c ] the meaning of the several variance estimates 
is as follows: 

s \ e + Ca\ h + BCa\ 

s\ a 2 6 + Ca 2 rb + RCa\ 

s\ + a 2 ,™ + Ba\ c + Ra\ c + 2 A 2 C 

C — 1 C 

+ Ca 2 rb 

s 2 rc -> a 2 e + a 2 rbC + Ba 2 rC 

S \c a2 e + a \bC + R a \c 

S 2 _v l r,2 

15 The u e “i U rb( j 

When we examine the foregoing expected values, we see that both s 2 rc 
and s 2 bc are testable against s 2 rbc as the denominator for the Fs and that 
and can be tested against s 2 rb , but s 2 rb itself is not testable. The 
great difficulty is that the main effect of primary interest, the A c effect, is 
not amenable to test unless we can assume either a 2 rC o r a 2 bC (or both) to be 
zero. If a 2 rC were zero, s 2 c could be tested against 5 2 &c ; if a 2 bC were zero, 
we could use s 2 rc to test s 2 c ; if both were zero, wecoulduse^ 2 rbc for testing the 
main effect variance, ^ 2 C . We can scarcely make a priori the assumption 
that either of these two two-way interactions is zero; in fact, the safest 
presumption is that neither is zero. It is frequently asserted that the failure 
of a two-way interaction to be significant when tested against its appro¬ 
priate error term can be used to justify the assumption of zero interaction, 
but failure to be significant means only that it could be zero. Furthermore, 
if R and B are small, a sizable interaction can go undetected. This issue, 
along with a similar one, will be discussed later under the heading “Pre¬ 
liminary tests and pooling.” Suffice it to say that model [a r a b A c ] is not 
recommended. 

(For the situation involving R children and B observers with main 
interest in the effect of the C conditions, we can simply sum or average 
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the B possible ratings for each child for each of the C c o nditio,ls ’j^ n |"^ 
these sums or averages as the scores in a two-way muted model srtup w A 
R rows and C columns, which leads to a straightforward test of the effect 

o, ^;rrrrotuw,i. .r^ « •— 

Which all three bases of classification mv^ve samplmft he wil^ to 
know that the two-way interactions can be tested against s r & c 
is no way of testing the main effects without making untenable assumptions 
euarS two-way interactions. This sad state of affairs is not too sad 
simply because experimentation involving the random model, three-way 

classification, is hard to come by. . 

Care XV. Mixed model MAI, but a pseudo three-way 
Suppose a sample of R individuals in block 1, a sample of R differ n 
individuals in block 2, and so on. The B blocks represent B e yP e ™ 1 ^ 
conditions, or B levels for a factor, the effects of which are to be deter¬ 
mined and at the same time the C columns stand for another factor whic 
is also’to be evaluated. The B sets of R individuals are used because it is 
not feasible to use each person under each block condition. r s ^PP a 
the blocks stand for different groups (say, diagnostic) from each of whic 
j? cases are drawn at random. We wish to compare the groups and a so 
the C conditions and perhaps the B x C interaction. This setup is often 
referred to as the “split-plot design,” the plot concept coming from ag 
cultural experimentation More recently, this design is said to involve 
“nesting”—one group of R persons are nested in one block, another set 
o Tr persons are nested in a second block, and so on, with never a move 

^I^t^Ve-examine Table 16.9 in order to determine how to set up the 
model for this situation. We first note that for Case XII the variation 
among the row means (*,..) contributes to r 2 ,. as an estimate of individual 
difference variation, whereas for Case XV each of these rowuueans u, an 
average for B different individuals; hence row means do not hold for 
individuals. We do, however, have individual difference variation within 
each block as represented by means of the type X rb . (right-hand part of 
Table 16 9). Accordingly, we can anticipate a sum of squares for individual 
differences which will involve combining the sums of squares wthin each 
block; i.e., CSX (J? rt . - X, ) 2 , with RB - B degrees of freedom. The 

resulting variance estimate may be labeled s*„ for individual differences^ 
In ordinary three-way classification (Case XII) the B sets of means of 
the type X/have to do with row (individual) by block interaction, an 
interaction which reflects the failure of the individuals to maintain similar 
score positions from block to block. But with independent cases m each 
block, no block by row interaction is possible; a person cannot reac 
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more^than one” b^ofd^^SsidJnext tte 5“ typeof^ ^ 
the bottom of Tahip i a q ti, ,. . ^ type of mean at 

d”ffeTemTndivi! he T of ' 

«H!Lte”SSiS , r happ r d to have been assi § ned *& 

Table 16.15. Modification of variance Table 16.10 for ease XV: 1? different and 
independent individuals in each block 


Source 

Sum of Squares 

df 

Variance 

Estimate 

Individuals* 

c ?f(4-4f 

RB - B 

s 2 . 

Blocks 

RC^(X b .-Xf 

B- 1 

9 2 , 

Columns 

RBZ(X.. C -Xf 

C-l 

^ b 

t.2 

B x C inter¬ 


J c 

action 

R * ? ~ X-b. -f.. c + Xf 

(B - 1 )(C - 1) 

v 2 

O ?l /! 

Remainder 

^ 6 ? ~ ^hb. ~ -S’* + X,,.) 2 

£(R ~ 1)(C - 1) 

Ou 

r.2 

O 7i 

Total 


RBC- 1 

/t 

* The sum of 

squares for individuals is computed by substituting in 



- ■— 

for^cfse XV- re8 ° mS “ mmd ’ W ® Write * he followin g specific model 
(*»« - fi) = a ( + A, + A c + A b A c + h rbe 

sums of sauarpc onH +w m auuirdciea irom ( X rbc - ^). The several 

lin.Hiff q r and their d fi are given m Table 16.15. Note how the first 
htie diners from, the first line of TAIUf* i a i a- , , t 

d lo the row by column m, cocoon c, in Table 16.4. Actually, the 
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remainder in Table 16.15 involves possible individual by cohtmn inter¬ 
act composed of ordinary row by column interacts within each block, 

«» .=.«»■ v.™« estimates am as Mow. 
(recall that A contains A as a component). 






s\ —> 

CT 2 \ + 

RC 

-2 A 



B - 

1 6 




RB 

S 2 _> 
3 c 

A + 

c\c 

+ cT^i 

^ be 

A + 

o 

a rC 

+ (B- 

s 2 _> 
A 

- <y 2 e + 

a \c 



2 A 

C 

R 


2 2 (.4 A) 2 


From these values we see at a glance mat n is me —— o 

s2 a test which is analogous to s\ls\ in the one-way classification setup 
for the difference between means of independent groups For testing 
?the remainder estimate, is appropriate. Since A is in part an 

estimate of indMu.i by c.ium. 

column means are correlated (based on the same or related or matched 

“The''remainder variance estimate is also appropmte f^stjnig *e 
B x C 1 interaction. Note that this interaction involves C means in ea 
block that are independent of the C means in every other block but at the 
same lime die C means within each block are not independent of each 
other This interaction has a special meaning when B stands for diffe e 
proum and C stands for C tests all scored in comparable standard .core 
form. The column means for each block are the basts for a re 

profile- hence a test of the B x C interaction tells us whether there are 
significant differences among the profiles for the B groups. 

^Caution: Case XV as here outlined calls for the same number 

individuals per block (or group). iv v and XT reauire 

Assumption of homogeneity of variance. Cases IX, X, and XI. reqmre 

similar variances for all cubicles, but only Case IX permits a test of th 
asTumntion For Cases XI, XII, XIII, and XIV it is assumed that error of 
measurement variance is the same from cubicle to cubicle. The assumption 
for these cases is not testable unless we have measurement replication wit 
m scores per cubicle. Case XV assumes that the row variance within 
blochs is homogeneous from block to block when s < is used to tests „, a 
to lihe row by column interaction within blocks is similar from block to 
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aletesTahle 61 ^ 61 * ° T ' S against Both of these assumptions 
testable since the required within-blocks estimates are computable. 

PRELIMINARY TESTS AND POOLING 

ml hen 71 diSCUSS f d CaS6S XIU and XIV > w ® found that certain effects 
could not be tested without assuming that an interaction is zero The 

whin mstV 8 t0 aSSUme an “faction is zero if it fails to be significant 
n tested against an appropriate error term. The writers of textbooks 
onmathematic.! statistics are remarkably mum on this point, presumably 
because the situation gets too “iffy”: a main effect fs signffi“fh 
reaches say, the .05 level, and i/a certain interaction was not significanf a 
a specified level. Under such circumstances a P for an effect c2m“h!ve 

Wh6n unencumbered b y conditional probabilities 
Note that preliminary tests may have to do with the assumption of zero 
interaction in the numerator term of F(as for Case XIII) or i/the denomi¬ 
nator term (as for Cases II and X). Failure to satisfy the asmmpttn 0 Ta 
zero interaction m the numerator will lead to too many “significant” Fs 
Stated differently, significance for a main effect cannot be sffely claimed 

mam effects. Failure to satisfy an assumption of zero interaction in the 

obtaTedTloT” 1 ' ead t0 t0 ° feW Si « nificant Fs ’ wh 'ch means that an 
obtained F possesses greater significance than its P indicates. 

rehminary tests are also used in connection with the “pooling” of sums 

of squares and of their dfa. To understand the meaning ffpooling lelus 

onsider Case IX in which all effects are testable against s i The advo 

I’ZT V’- “ T d ,s “* *•- If F 

at say the .°5 level, the sum of squares for the three-way interaction term 

nooUH blned ^ lth , that of ' s2 »’ with the d fi also being summed. Dividing the 

eCrorle 7 S ^ ^ an ° ther esti “ a * of variance fofthe 

error term. This estimate is next used to test the two-way interactions 

ich if insignificant provide additional sums of squares and dfs for 
adding to the pool already made up. 4 / or 

frJ hC Cl f lm I d advanta 8 e of Pooling is that the number of degrees of 
freedom for the denominator, or error, term of Fis thereby increased with 
a resultant more stable estimate of variance. But whetheC this procedure 
provides an improved or better estimate depends, of course, on whether 
e interactions judged to be insignificant are really zero in the sampled 
population. Actually, the F based on the pooled values may be 
larger or smaller than the F based on the appropriate varEe Estimate 
tamed without pooling. When we examine the F table we see that the 
gam in rf/does not have an appreciable effect, in the sense that a smaller F 
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is required for significance, except when n 2 is very small, say less than 8 or 
10. It should be clearly noted that the gain in df by pooling does not lead 
to a reduction in the sampling errors of the means being tested. 

The use of preliminary tests as a basis for pooling is not nearly so 
defensible as textbooks written prior to 1951 would have us believe.. The 
work of Paulif indicates that the usually advocated rule (that when F is less 
than the value required for the .05 level, pooling is permissible and advis¬ 
able) is far from satisfactory. He sets up an elaborate set of rules leading 
to the decision “never pool” or “sometimes pool” or “always pool.” 
Space does not permit an exposition of his rules here. A simple rule to 
follow when the dfs are equal, or when unequal provided both are greater 
than 6, is to pool only when F is less than 2. Even when we follow the rules, 
Fs based on pooling do not lead to Ps of precisely the same meaning as Ps 
obtained from Fs which do not involve pooling. 

HIGHER-ORDER CLASSIFICATION 

There are times when it is both desirable and feasible to study the 
variations of a dependent variable associated with variations in more than 
three variables. For such a study the data are classifiable in more than three 
ways. We have already mentioned the setup in which an observation., is 
made on each of m individuals under each of the combinations ol condi¬ 
tions defined by rows, blocks, and columns. There will be RBC scores for 
each individual, and the scores may be classified not only as belonging to a 
given row and a specified column of a particular block but also as belong¬ 
ing to a certain individual. Although it is easy to make an orderly 
arrangement of the data for quadruple classification, the required compu¬ 
tations become somewhat burdensome. For the situation involving a 
fourth classification, based on either individuals or on a fourth independent 
variable, there will be 16 sums of squares: 1 for total, 4 for between groups, 
6 for two-way interactions, 4 for three-way interactions, and 1 for four-way 
interaction. When five classifications are used we will have sums of squares 
for the total, 5 betweens, 10 simple interactions, 10 triple interactions, 
5 quadruple interactions, and 1 fifth-order interaction. It is not within the 
scope of this book to outline the computations for these higher-order 
classifications.^ 

The possibilities of the variance technique as a method of extracting 
from one set of data information regarding not only primary effects but 

f Pauli, A. E., On a preliminary test for pooling mean squares in the analysis of vari¬ 
ance, Annals math. Stat., 1950, 21, 539-556. 

+ See Edwards, A. L., and Horst, P., The calculation of sums of squares for inter¬ 
action in the analysis of variance, Psychometrika, 1950, 15, 17-24. 
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also interactions have, at times, led to rather indiscriminate inclusions of 
variables. For instance, a classification of subjects as male or female may 
be made in order to determine possible sex differences. Since the typical 
experiment for which the variance technique is used is likely to be based on 
a relatively small number of subjects, it is very doubtful whether any 
information of value will be added to the sum total of the already incon¬ 
sistent findings concerning sex differences. 

Those who carry out studies involving more than three-way classification 
encounter great difficulty in interpreting significant higher-order inter¬ 
actions. Some have thought it safe after ascertaining the sums of squares 
for the primaries and the two-way and three-way interactions, to use the 
remainder variance, which is a composite of untested higher-order inter¬ 
actions, as an error term. Such a practice assumes insignificance for the 
interactions whose sums of squares are thus allowed to combine, but since 
there are instances of significant four-way interaction, the cautious 
investigator will extract and test all the possible interactions before using 
such a remainder as the error term for F. 

As a matter of fact, the choice of the proper error term for higher-order 
classifications is, at times, quite complicated. For the simple four-way 
setup involving the fixed constants model, with m replications of individ¬ 
uals per cubicle, the estimate is the correct error term for testing all 4 
main effects and all 11 interactions. For the mixed four-way model with a r 
standing for individuals (a typical setup), the main effects for the three fixed 
constants factors are tested against the respective two-way interactions 
involving individuals, the three possible two-way interactions among the 
three fixed factors are testable against the appropriate three-way inter¬ 
actions involving individuals, and the three-way interaction for the three 
fixed factors can be tested against the four-way interaction. No inter¬ 
actions involving persons can be tested nor can the main individual differ¬ 
ence effect be tested. If anyone cooks up a research calling for the mixed 
model with two random and two fixed constants factors, he should be told 
that the three variances of principal interest (the two main effects for the 
fixed factors and their interaction) are not testable in any exact way. 

FACTORIAL AND LATIN SQUARE DESIGNS 

The student who encounters the term “factorial design” will need to 
know that it is difficult to make a distinction between factorial design and 
the analysis of variance setups discussed in this chapter. The bases for 
classification are referred to as factors; the categories within a classifica¬ 
tion are termed “levels.” Perhaps the term factorial design is inappro¬ 
priate when one basis for classification is persons. 
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balancing out the effects of possible fertility differentials from row to row 

have used the Latin square principle 

f ,* Va S—to evaluated, tto row. ..and for T 

S“ir toSt for evahrati.g .to effect .f 
t,e Z.t“ u‘ld S*“,Uto”1a square design tight to very useful 

l n avTweS\^L^ mathemhtfcal model, which may 

be written as , , f 

(X„, - fi) = a r + «c + a < + /«* 

The as refer to row, column, and treatment effects, and/« ( is ^ retnain- 
dero- sdul lt fo lows from the model that the breakdown of the tota 
re. aad degree, of freedom - .tod to »™»'^"“ 0 “ 
row., for columns, aad fortreatments.each wit Id 6 

When the toregoiag we see a marked difference: the absence 

2S3 £ r £ T £ **>«»«— 1 *' 
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interactions are zero. This assumption is necessarv for A / 
sanly independent) reasons • m fL ecessary for three (not neces- 

available for taking out ™ 'w there are not enough degrees of freedom 

confounded with interLtfonra^rtffr 0110 ?’ (2) the maitl effects are 

an error term app“S ^ ^ 

as 71717 ly be specified 

just T “treatments” for the e * , each of three factors rather than 
mean squares are (aside from a clrZon^f^ ^ bl ° CkS ’ ““ 6XpeCted 
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interaction, thus summing through blocks and dividing by 3 will yield 
means that will show row by column interaction, but since this interaction 
is exactly the same within each block there is no three-way interaction. 

The boldface numerals in the three blocks are the “scores” for the 3x3 
Latin square to the right. Each of these boldface values enters the Latin 
square with its row and column designation intact and with its block 
source designated by A or B or C. For the Latin square so generated, it 
will be seen that the row means are all 4; ditto, the column means. But for 


Table 16.16. A Latin square generated from a three-way layout 
Biocks A B C Sau 


1 2 3 

1 2 3 

1 2 3 

1 2 3 

I 4 4 4 

II 2 4 6 

HI 6 4 2 

4 4 4 

2 4 6 

6 4 2 

4 4 4 

2 4 6 

6 4 2 

A4 B4 C4 
B2 C4 A6 
C6 A4 B2 


the block effect we have from the Latin square the following means: 

X_a = (4 + 6 + 4)/3 = 4.67 
X& = (4 + 2 + 2)/3 = 2.67 
X c = (4 + 4 + 6)/3 = 4.67 

which are illusory as indications of a main effect because the effect was 
produced by the row by column interaction—no block differences held for 
the starting three-way situation. Compare this outcome with the expected 
value for s\ and note that the row by column interaction is not involved 
in the expected values for and s 2 c . 

For the second, more common use in psychology of the Latin square, 
with rows standing for persons (animals), columns for order or sequence 
in testing, and Latin letters for experimental conditions (treatments), we 
have a mixed model [a r A t A Q ] with rows as random variates. The expected 
mean squares are (again omitting the common a 2 e ): 


\ l T J°rTC + ~ - —+Ta\ 
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‘rTC + ° 2 rT + a 2 rC + -S a 
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The primary interest is in testing s 2 t , but we see no suitable error term 
unless it can be assumed that the order by treatment interaction is zero. 
Such an assumption is equivalent to saying that the influence of the order 
A, D , B, C (see Latin square on p. 341) is the same as the influence of the 
order B, A, C, D; and so on. Whatever the order effect, whether it be 
practice’, fatigue, boredom, something physiological, change in mental 
set, etc., it must be assumed that any such effects or combination thereof 
are independent of particular treatments. If, for example, treatments were 
various drugs, differences in residual effects would lead to order by 
treatment interaction. 

The reader will have noted that when F is taken as s 2 t ls 2 res , the presence 
of order by treatment interaction will mitigate against getting a signi¬ 
ficant F, and that if F reaches the oc level of significance he can claim 
significance at better than the a level, though how much better remains 
unknown. The reader will have also noted that for a (typically) small 
number of treatments, a single Latin square design uses so few cases 
(T in number) that sampling errors will tend to be very large. The advan¬ 
tages of larger N can be attained by replication—additional sets of T 
persons provide additional Latin squares, for a discussion of which 
the reader is referred to Cochran and Cox.§ And, finally, the reader 
may not have noted that the presence, in the expectations for both s u t 
and s\ e8> of the three interactions involving rows indicates that the 
To 2 , component must be relatively sizable in order to lead to an appreci¬ 
able F. . 

Concerning the merits of the Latin square design in psychological 
research, there has been a difference of opinion attributable in pait to the 
until recently unsettled question regarding the expected values of the 
variance estimates when interactions are present. Now that the expecta¬ 
tions are known, it is seen that the substitution of a Latin square design in 
lieu of a fixed constants factorial design has disadvantages that far out¬ 
weigh its only advantage, i.e., the making of observations on fewer indivi¬ 
duals. But the use of the Latin square design as a method of balancing 
sequence effects and also as a method for using repeated observations on 
the same individuals (individual differences are extracted as a row effect, 
which is the gain from having correlated treatment means) has an appeal 
that must be evaluated against the worth of less “iffy” designs such as (1) 
random assignments of m individuals to each of the T treatment (or 
experimental) conditions or (2) the use of matched cases with matching on 
the basis of some relevant variable(s) or on the basis of pretest measures 
of the dependent variable under consideration. 

§ Cochran, W. G. and Cox, G. M. Experimental designs , 2nd ed., New York: John 
Wiley, 1957. 
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SELECTED COMPARISONS 

As in one-way classification, when F indicates that a main effect is 
significant, we may proceed to test specific contrasts among the means. 
The reasons for doing this and the general procedures are the same as, those 
set forth in the last section of Chapter 15, which the reader should review 
at this time. We limit the discussion here to the two-way classification 
setup, fixed effects model with equal ms in the cells and the mixed model. 
For the column means (or the row means), we could have a D or a D' 
computed exactly as before for the case of equal ms. 

In the fixed-effects situation, the needed standard error for a contrast 
is given by the square root of 

D mR\a b / 

in which s 2 w is the within cells variance estimate and a and b are the number 
of means being averaged for a contrast. Again, when a = b = 1, we have 
the error for a D-type contrast. The significance of a contrast springing 
from an a priori hypothesis can be ascertained from t = Djs D (or t 
= D'Is d ,\ with df = mRC — RC. A contrast of the data-sno.oping 
variety will be judged significant at the a level if D/s n (or D'ls D >) reaches 
K where K is now defined as the square root of the product of (C — 1 ) 
times the F required for the a level of significance for n x — C — 1 and 
n 2 = mRC - RC degrees of freedom. For comparisons involving row 
means, R and C are simply interchanged. 

For the mixed model with C means based on the same R persons (or R 
sets of matched individuals), a contrast of the D type will have s D 
= s rc V2j~R. Given an a priori hypothesis, we have t = Dls D with 
(j? _ i)(c — 1 ) degrees of freedom whereas for a contrast suggested by 
an examination of the data, DIs d must reach K which this time is the 
square root of the product of (C - 1) times the F required for oc with 
Hl = C - 1 and n 2 = (R - 1 )(C - 1) degrees of freedom. 

It should be noted that for the mixed model situation neither the 
procedure involving t nor that involving K makes any allowance for the 
possibility that the correlation between the scores in the columns involved 
in a particular contrast may differ from the averages of the C(C — l)/2 
intercorrelations entering into s 2 rc . The value of t could, of course, be 
calculated independently of the over-all row by column interaction, but 
it is not clear whether the Scheffe method permits this alteration. 

Apparently neither the t approach nor the Scheffe method is applicable 
for contrast of the D’ type in the mixed model, but there appears to be 
little need for D' comparisons in the mixed model situation. 




Chapter 17 

TRENDS AND DIFFERENCES 

IN TRENDS 


So-called trend analysis is, in essence, a part of the larger problem of the 
relationship between variates when we have an independent-dependent 
variable situation. Correlational analysis is appropriate for specifying 
relationships between individual difference variables regardless of whether 
or not one variable can be characterized as dependent on the other as an 
independent variable. When it can be argued that one variable is dependent 
(consequent) and the other independent (antecedent), there may be some 
interest in the regression of the dependent on the independent variable, 
both variates being individual difference variables. We have already given 
methods for testing the significance of regression coefficients (p. 142), 
for the equivalent testing of the significance of linear regression (p. 272), 
for testing linearity (p. 275), and for testing the difference between re¬ 
gression coefficients based on independent samples (p. 143). 

Although our discussion of the analysis of variance has been mainly 
concerned with the significance of the differences between means, the 
perceptive reader will have noted that when a basis of classification involves 
an ordered variable, or factor, such as distance, degree of illumination, 
size, etc., which is manipulate as an independent variable, the F test for a 
main effect is really concerned with whether or not some dependent 
variable, X , is being affected by the factor. That is, is A as a dependent 
variable influenced by or related to the manipulated variable ? This may 
be regarded as a question of regression (most mathematical statisticians 
subsume all analysis of variance under regression analysis) or more simply 
a question of trend and its form. For this situation the correlation 
coefficient ceases to be a useful descriptive term, but the presence of linear 
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trend and the slope thereof is of interest as will also be the possible 
curvilinearity of the relationship. Or differences among trends may be of 

primary interest. . . 

Some of the techniques to be presented in this chapter are frequently 

subsumed under the topic “Orthogonal Polynomials.” 

Review and recast. When we have G levels on a factor, or independent 
variable with m different individuals randomly assigned into each of the 
G groups, we have a one-way classification design (possible analyses 
suggested on pp. 270-81) with Y as the dependent and X as the indepen¬ 
dent variable. The ms per group need not be the same, although equal ms 

are preferable. „, . , 

When we have C levels on the ordered factor and R levels on a second 
ordered factor (or R conditions not orderable), with m independent cases 
assigned to each of the RC cells, we have a two-way design. A plot of the 
X means against the C values (or levels), done separately for each of the R 
levels (or conditions), will permit the drawing of R trend lines (as in 
Figs 16 1-16 4, pp. 307-08). Or a plot of the appropriate X means 
against the R values (or levels for the factor identified with the rows), this 
time separately for the C levels, will permit the drawing of C trend lines. 
The test of the R x C interaction provides a test of the difference 
between the R trend lines (or the C trend lines when the row factor is 

ordered). , _ . . 

When we have C levels for one ordered variable and B levels on a 

second factor (quantitative or qualitative) with each of R individuals 
measured under all the BC combinations of conditions (a three-way, 
mixed model), a test of s 2 „ c against is a test of the difference bet ween 
the B trends plotted with appropriate X means against the column factor 
(or between the C trends when X means are plotted against the B levels 
when blocks stand for an ordered variable). Note that the C means 
entering into the trend for each of the B levels are correlated (based on the 
same individuals); ditto, the B means for C trends. The use ofs^ as the 
error lerm allows for the correlation. 

If for the B levels we used B sets of different persons, R persons per set, 
the C means for the trend of X against the C levels would again be cor- 
related but the B trend lines would be uncorrelated. The test of the B X C 
interaction, as specified in Case XV, p. 335, is appropriate for testing the 

differences among the B trends. , . 

A significant interaction in any of the foregoing types of situations 
simply means that the trends or curves are not parallel, regardless of their 
general shape, or the form of the relationships. A presentation of the 
trend lines or a description thereof is necessary for an interpretation of any 
claimed statistically significant differences among the trends. 
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linear trends 

The specification and testing of linear trends in psychology is of special 
interest for two reasons: (1) many relationships are linear in form 
sometimes predictably so from theory, and (2) the question as to whether a 
relationship is nonlinear is readily approached via a test of departure from 
linearity. Although we have already set forth a method (pp. 272-75') for 
testing linear trend (linear regression) and for testing nonlinearity, there is 
a somewhat shorter approach which is applicable only when the levels on 
the factor being varied experimentally are evenly spaced and there are m 
scores (measures) at each of the G levels. We will need to distinguish 
between two situations: (1) when the m scores are uncorrelated from level 
to level (i.e., m independent cases assigned randomly to the G groups) 
and 2) when the m scores are correlated (i.e., based on just m persons or 
m sets of matched persons). The first of these will involve one-way the 
other two-way analysis of variance but some computational methods 
developed for the first will also be applicable to the second. 

Linear trend: uncorrelated observations. First, a little algebra. It 

r lea 6 re< f ed that the re g res sion sum of squares was shown to be equal to 
Nr S „, where Y is the dependent variate and X the independent variate 
the variable for which we now choose the G levels. We have 

2 (Y'~ F) 2 = N r 2 S 2 - n( Zxy Tqs _ (2*# &xyf 

Bllt * M * NS* X s** 


2 xy = Z(X - X){Y - F) 

= 2 XY - YZX - XZY+ ZXY 
= ZXY- YNX — XSY 4- NXY 
2*2/ = 2 XY - XI, Y 

Now consider the sum, Z,xY, with * in deviation units and Tin original T 
units. 6 

2*y = Z(X- X)Y = 2 XY - XZ Y 

Thus Hxy — 'ZxY 

To simplify computations we may code the X variates into numerically 
small values, with a mean of zero so as to possess one property of deviation 
scores. Let us use v for the coded values of the G lvalues, or points, used 
to define the G levels. If G, the number of levels, is an odd number, we can 
assign a v of 0 to the middle level and have coded values of 


• • • -4, -3, 


1, 0, 1, 2, 3, 4, ••• 
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and if G is an even number we can assign -1 and +1 to the two middle 
levels and have coded values of 


-7, 


■3, -1, 1, 3, 5, 7, 


with £ = 1 • • • G, be the coded value for the gth group (level) and 


Let Vg, VUVU 5 — - -7 . 

let Y a be the Y scores for those in the gth group. Then 
S vy 


2r7 = SihTi + 
= »,2 Y 1 + 


+ s»,i; + - 

+ v„ 2 Y. + • 


+ Su 0 7(j 
H” IV;— f); 


Simply sum the m 7 scores for each group (level), multiply by the for 
level, and sum over groups, thus obtaining what we will designate as 2t> 7 
instead of 2u s 2 7„, a more exact symbolization. (The G separate sums o 
scores will already have been obtained when computing the total, between- 

groups, and within-groups sums of squares.) 2/v 2 

g Z regression sum of squares, (&i/) 2 /S, 2 , will be <&yf 2r 2 
= (2t> 7) 2 /2i! 2 in terms of the us, or coded 7s. With m cases per level, we 
have Sr 2 = 2 mv\ = mLv\. Simply square the (numerically small) v 
values, sum, and multiply by m. Thus, we have for the regression sum of 

squares, SS( y _ f f = m 2( Y' s - 7) 2 = (Sr 7) 2 /m2i> 2 

which since it has 1 degree of freedom, corresponds to the s 2 „ of p. 274. 
This is sometimes called the variance estimate for the linear component. 
It must not be forgotten that the foregoing computationally simple 
approach holds only for equal spacings for the levels on the independent 
variable, 7, and for equal ms in the G groups (at the G levels o ). 

By computational methods already given (formulas 15.6-15.8) we can 
obtain the total, the within-levels, and the between-levels sums of squares 
for the 7s. Recall from p. 270 that 


22(7- Y g f + m2( 7, - 7)' 


22(7- 7) 2 

(J 

and from p. 276 that 

m2( 7, — Yf = m2( Y, - 7'„) 2 + m2(7'„ 

From the last equation we see that 

m2(7„ - 7' 9 ) 2 = 1112(7, - Yf - m2(7'. 


- Yf 


Y? 


provides a way of calculating the sum of squares for the deviation of the 

arrav or group, means from linear form. 

Tte breakdown of the total sum of squares along with dfi and variance 
estimates may be assembled, as in Table 17.1. It will be noted that this 
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table does not contain the “residual from line” component of Table 15 4 

"r US ® d there as ‘ he error term for testi «g significance of F 
Actually we do not need this as an error term; we may take F = s 2 Is 2 

The n for the F table will be mG - G, a value that will be sometha^ 
(usually slightly) smaller than the n 2 = mC - 2 when the variance 
estimate based on the residuals about the line is used as error. This slight 
lossi in df will, m the long run, be compensated for by the fact that s 2 
tends always to be smaller than A w 


Table 17.1. Analysis 

of variance for linear trend, 

Y as dependent on G levels of 

Source 

Sum of 

Squares 

df 

Var. Est. 

Between levels 

mZiY, 

- F) 2 

G - 1 

V 2 

O f) 

Linear trend 

mS(r' 9 

, - ?) 2 

1 

s 2 ^ 

Deviation of means 

y 




from line 

mZ{Y a 

- n) 2 

G -2 

V 2 

$ 

Within levels 

SS(T - 

g 

- Y g f 

mG - G 


Total 


- Yf 

mG — 1 



A significant F — s 2 Js 2 w has various connotations for various people- 
a significant linear relationship, a significant linear trend, a significant 
mear correlation a significant linear regression, a significant linear slope 
( vx significantly different from zero), a significant linear rate of change a 
significant linear component of trend. 5 ’ 

The departure from linear trend is testable by way of F = s 2 h* 
and the maini effect of A (differences among the Y means for the G groups) 
• f . eS ® 2 y ~ s b [ s w ' Ordinarily, if s 2 b is significant, we would expect 
*. 01 7 <*‘° b ® s ’§nificant. It is possible for ^ to be j nsigniik P ant 
while i „ is significant simply because the latter takes into consideration a 
progressive increase (or decrease) in the C means. For example if for 
five successive levels of the A variable the Y means were 16, 19 21 23 and 
26, we would intuitively regard iT as having a greater effect’than if the 

values T? 19 ’ 26 f 16 ’ 23 ’ and 21 ' F ° r b ° th ° f th6Se SetS of means the 
v^ues of j , are, of course, identical. Suppose is such that F = AM 

is significant only at the .10 level and that we test the X effect for both sets 

t V han fhe 8 om C r^ f '"If" trend ' S ° doing yields ^cance at less 
t an the .001 level for the first set and no significance whatever for the 

““T*. T' 2 Th ® ™P° rtant point is that greater significance than that 
reached by s „ should emerge if there is a systematic (linear) trend of the 
means. This is somewhat analogous to the advantage accruing from a 
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one-tail test when the direction of the difference between two means has 
been predicted from theory. Certainly, if we have predicted a systematic 
increase (or decrease) of Y means for the G levels of X, the extent to 
which observations do show the predicted trend should somehow emerge 

in the statistical analysis. ... 

Parenthetically, there are times when theory predicts a directional 
outcome for G experimental conditions not involving levels on a manipu- 
lable ordered variable—the factor is qualitative instead of quantitative. 
Suppose for four conditions labeled A, B , C, and D that theory predicts 
y > f c > f D > Y b and the observations tend to confirm this predicted 
ordering. Unfortunately, there seems as yet to be no satisfactory way to 
incorporate the predicted ordering of results into a significance test. 

Linear trend: correlated observations. So far our treatment of a 
linear trend has been confined to the setup where the m scores for each of 
the G groups, or levels, are independent from group to group. Suppose 
each person is measured at each level, and that the levels are again chosen 
to be equally spaced on the factor, or independent variable. This becomes 
a two-way analysis of variance setup, mixed model, with R rows for R 
persons and C columns for C levels on the factor. The differences among 
the resulting correlated column means, it will be recalled, are tested by 
p — s\ls 2 re . The means for the C columns when plotted against the C 
values of the independent variable may show a trend the linear component 
of which we may wish to test. With equal spacings for the levels on the 
independent variable, we may again set up coded, or v, scores for the points 
on the independent variable and proceed to compute the sum of squares 
for the linear component of the trend, RZ(X' . c — X) 2 , as (I \vX) 2 !RXLv 2 in 
exactly the manner indicated earlier for G independent groups. (Note: we 
use X here as the dependent variable since the entire discussion of two- 
and three-way analysis of variance has been in terms of Xs.) The sum of 
squares has 1 degree of freedom, hence is equal to an s 

What we have done here is to break up the between-column sum of 
squares into two component parts, a linearly predicted part and deviations 
of the means from linearity: 

KL(X. 9 - Xf = - Xf + RZ(X. C - X '. c ) 2 

C c c 

which again allows us to obtain the sum of squares for deviations from 
linearity by subtraction. This sum divided by C - 2 will give an s 2 d . Thus 
p c s 2 Js 2 rc is a test of the main effect of the factor; F ? = s 2 Js 2 rc tests 
the linear trend or linear regression; and F d = s 2 d ls 2 rc provides a test of 
the departure from linearity. 

The use of s 2 „ instead of s 2 „, as the error term distinguishes between 



352 


PSYCHOLOGICAL STATISTICS 


two situations involving the relationship of a dependent and a manipulate 
variable (factor): s 2 w is used when the dependent variate scores are 
independent from level to level of the factor, and s 2 rc is used when the 
dependent variate scores are themselves correlated from level to level 
either because each of R individuals is measured at each level or because we 
have R sets of matched individuals, C per set with random assignment 
from within each set to the C levels. 

DIFFERENCES AMONG SLOPES 

Earlier (p. 143) a method was given for testing the difference between 
two regression coefficients (linear slopes) based on independent samples. 
It will be recalled that this difference was tested against an estimated 
standard error that depended on the within groups residuals about the 
two regression lines. 

Slope differences for independent groups. A test of the difference 
among three or more regression coefficients likewise depends on the within 
groups residuals. The procedure entails the calculation of the sum of 
squares, Xx 2 for X , Ey 2 for 7, and Exy, all three separately for each of the 
G groups. The two sums of squares are computed by (3.6) and the cross- 
product sum by 2AT - EXEYjN, with N replaced by m g . 

Table 17.2. Calculations for testing differences among G slopes 


Group 

df 

Ss 2 

Yy 2 

S xy 

Byx 

S( Y' - V) 2 

S(r- F ') 2 

df for residual 

1 

m 1 — 1 

A i 

B l 

Ci 

Ci/A, 

C\/Ai 

B x - C\/A 1 

m 1 — 2 

8 

m g - 1 

N 

B , 

c * 

C,/A, 

c ‘Va, 

B , - cVa, 

m g -2 

G 

m G - 1 



C G 


C V Aff 

B(,' - C%/A a 

m G -2 

Sum 

— G 

SA, 

SB, 

SC, 



S(B„ - C 2 ,/A,) 

S tn g -2 G 

Within 

Sm, — G 

A w 


c w 

Cjy/A w 

c 2 w /a r . 

B w - C\JA W 

S nig - G - : 


A tabular arrangement (Table 17.2) of these sums, along with additional 
indicated calculated values, will facilitate the exposition. In this table the 
As, Bs, and Cs represent 2£ 2 , 'Ey 2 , and Exy, respectively. The slope, or 
B yx , is calculated as Exy/Ex 2 ; the regression sum of squares, 2(7' — 7) 2 , 
is given by (Exy) 2 /Ex 2 ; and the residual sum of squares, 2(7- 7') 2 , is 
obtained by subtracting the regression sum of squares from 2 y 2 . The first 
four and the last two columns are summed downward to get the “sum” 
line, and the first four of these sums are entered as the first four values in 
the “within” line. The next three entries in the “within” line are obtained 
from the A w , B w , and C w of the “within” line, not by summing downward. 
Note that the A w , B w , C w , being 2A ff , 2B S , and 2C y , are nothing more than 
the familiar within-groups sums, obtained by first summing within groups 
then summing over groups; hence, the subscript w. The student should 
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convince himself that CJA W does not correspond to the regression coeffi¬ 
cient that would hold in case all the groups were combined, or thrown 
together (thus yielding one scatterdiagram), with sums of squares and the 
sum of products computed about the grand total means. It is also true 
that the value of C JA„ is not a simple average of the C„/A„ values. 

Under the null hypothesis that the population slopes for the <7 groups 
do not differ, we are in effect saying that a common slope holds for the G 
populations. The C„/A,„ is taken as the best estimate of this common 
slope, tin estimate that in no way depends on possible group differences in 
the X and in the Y means—we need not assume equality of means. The 
residual, B w - C 2 JA„, about the regression line with slope C,JA a will 
have (2m 9 - G — 1) degrees of freedom; G degrees of freedom are lost in 
the calculation of B K and an additional degree of freedom is used up in 
calculating the one slope, C JA„. The df for S(B, - C*,/A,) is simply the 
sum of the dfs for the parts being summed; i.e., - 2G. 

If all G slopes were exactly the same, each would equal C„/A„„ and the 
sum of the G residual sums of squares would be exactly the same as the 
residual sum of squares in the “within” line. That is, —1B, C 2 0 /A tf ) 

would equal B„ - C 2 JA„ exactly. But in practice, the G slopes will not 
be the same, even when the population slopes are identical, simply because 
of sampling errors. If it is recalled that for any sample, the slope B w 
taken as rSJS x , or the exact equivalent Sxy/Sx 2 = C/ A, is that value of the 
slope (of the regression line) which minimizes the residual sum of squares, 
it is readily seen that the residual sum of squares for, say, group g will be 
larger about the line with slope C JA„ than about the line with slope C 9 /A„ 
(unless the two slopes happen to be equal). The same will hold for all 
G groups simply because CJA U , is not the optimum value for the separate 
groups. The greater the divergence of the separate G slopes from CJ A„. 
the larger the residual sum of squares on the “within” line compared to 
the sum of the G residual sums of squares. That is, B„ - C 2 JA W will be 
larger than 2(B 9 - CyA„). This means that B, - Cy/A,„ as a sum of 
squares may have a source of variation which does not affect the sum of the 
G separate residual sums of squares. That source is the possible differences 
among the G regression coefficients, or slopes. 

Accordingly, we may break down the residual sum of squares in the 
“within” line into two parts: a within-groups residual about the separate 
regression lines, or 2(B g - C 2 ,/A,), plus differences among slopes The 
sum of squares for slopes is obtained by subtraction. (B w f'»/AJ 
— S(B, - C 2 /A ). Likewise, the df for the slopes part is obtained by 
subtraction; (Zm„ - G - 1) - (2/n, - 2G) = G - 1. Division of the 
sum of squares for slopes by G — 1 will yield a variance estimate, s s! , 
and division of S(B g - C 2 „/A B ) by its df will yield a within groups residual 
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variance estimate, Then we have F = as a test of the 

differences among the G slopes. 

This test for the differences among slopes is general in that it is applic¬ 
able (1) when Y and X are both individual difference variables and the 
G groups are independent or (2) when Y is regarded as dependent on Xas 
a manipulated variable and the G groups are independent and there is also 
independence from level to level of X If for the latter situation the levels 
on X are equally spaced and identical for all G groups with the same 
number of cases per level within a group, the computation of the Zxy terms 
and the 2a; 2 terms can be simplified by using the coding system (the vs) 
suggested on p. 348. It is preferable, although not required, to have equal 
group Ns, i.e., equal m g . When both the m g and the spacings of X are 
equal, CJA W will be the simple average of the G g \X g , Otherwise, it is a 
weighted average, m, being the weight for the gth group. 

Slope differences, independent groups but correlated observations within 
groups. The scores may be arranged as in a three-way analysis of 
variance, with blocks for B independent groups that are measured under 
the B conditions (either qualitatively different or as B levels on a quantita¬ 
tive factor), with columns for C levels on a quantitative factor or as C 
trials in a learning task, and the rows for R individuals. The observations 
from column to column are (likely) correlated because we have repeated 
measures on each individual. In ordinary analysis of variance this setup 
is Case XV (p. 335) for which the test of the B x C interaction provides a 
test of the differences among the B trends (p. 337). Our present concern is 
the differences among the linear trends, or slopes, shown by the B groups. 

When this is of interest to the experimenter he should, for sake of 
computational simplicity, have equal spacings for the C levels with exactly 
the same levels for all B groups. (In the learning setup, it is usually tacitly, 
perhaps gratuitously, assumed that trials constitute equal spacings.) The 
method to be given here presumes equal spacings for the C levels The 
linear part of any possible trend for the 6th group can be specified in terms 
of the best fitting line to the successive C means (the X. bl , * • • , X. bc , • • *, 
X 6 e) for the dependent variable, here designated as X. But the rth 
individual in the 6th group has C scores which permit the plotting of an 
individual trend line which, in turn, may be described in part by a straight 
line the slope of which will represent the linear component for the indivi¬ 
dual’s trend. These individual slopes will always show variation from 
person to person, hence are variates—we may regard the slope for an 
individual as a sort of “score.” The average of these scores (slopes) for 
individuals m the 6th group will correspond to the slope for the 6th group, 
thus permitting us to regard the group slope as a mean. We will have B 
such means. 
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If we code the C levels as vs, we can readily calculate a slope (for 
the linear regression of X on the coded scores of the factor) for each 
individual. It is simply 2nX/2n 2 (no m here since each person has just one 
Jf score at each of the C levels). The calculation of RB (or N t where 
N, = SM) different values of SrX is greatly facilitated by having the v 
values on a strip that can be placed just under the Xs in a row. The -.v is a 
constant-the same for all individuals. For the present purpose the sizes 

of the B groups need not be equal. . 

With the individual slopes calculated, the test of the significance o 
the differences among the group slopes is not only easy to carry out but 
also easy to conceptualize. When we regard the individual slopes as 
“scores ” we have a simple one-way analysis of variance setup with a 
breakdown of the total sum of squares (for the slopes) into between-group 
and within-group sums of squares with B - 1 and RB - B (or N t - ) 

degrees of freedom, respectively. The computations for this part are by 
formulas (15.6-15.8) or (15.9-15.11), with the individual slopes taken as 
the Xs for those formulas. Actually, the calculations for the test of 
significance can be made on the individual ~ZvX values as the “scores for 
the one-way analysis of variance since the Sr 2 part of the individual slopes 
is a constant. F = s\ls\ is the desired test forjudging whether the group 
slopes (or linear regressions) are heterogeneous. 

Slone differences, observations correlated two ways. The scores can 
be arranged as in a complete three-way analysis of variance into | blocks 
C columns, and R rows, there being a total of just R individuals. Each 
person has a score in every column and m every block—there will be 
intercolumn and also interblock correlation. Provided the C levels are 
equally spaced, we can again code in order to compute as an 

individual’s slope for X on the column factor (independent variable) coded 
as ns, but now each individual has B slopes since he has a se .t C T ^ores 
in each block. The total number of individual slopes will be RB, and 
either these slopes or the 2nX values (2n 2 again being a constant) can be 
arranged into a new table with R rows and C columns (each o the block 
conditions is now assigned to a column position for this new tab[e). Thus 
we have a two-way analysis of variance setup, mixed model, for which 
F = ,v 2 /s 2 with the usual dfs will provide a test of the differences between 
the B slopes (one for each block) since the means of the columns in the 
new table correspond to the slopes for the B blocks, each block slope being 
the mean of individual slopes. Due allowance has been made for any 
possible (and likely) correlation between the blocks. The foregoing 
analysis is possible because for equal spacings on the C factor the slope for 
the 6th block of scores is the same as the mean of the R individual slopes in 
the block. The B blocks may stand for B qualitatively different conditions 
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or for * levels on a quantitative factor, with no requirement that the B levels 
be equally spaced. 

Now suppose that the B blocks represent B equally spaced levels on a 
actor and that we wish also to consider the linear parts of the C trends 
which are exhaled when we plot the appropriate means (the X lc , • • •, 

' 7 se P aratel y for each of the C sets, against the B levels of 
he block factor. (If the student is confused about how to pick out these 
sets of means, in contrast with the sets used a couple of pages earlier, he 
should refer to Table 16.9. For the earlier discussed trends for * against 
the C factor, the means along the bottom of each block are used, whereas 
for the present trends of X against the B factor, a mean is picked from 
each block as one goes down a column.) It will be recalled that, so far as 
differences in general trend are of interest, the testing of s 2 Ir against y 2 . 
provides a basis for saying whether there are significant differences among 
the 5 trends for X against the C levels and also for saying whether there are 
sugmficant di fferences among the C trends for X against the B levels (see 
P:i 47) ' However ’ for the linear components of the trends the test of the 
differences among the B slopes does not simultaneously provide a test of 
the differences among the C slopes. For the latter test we must calculate C 
slopes, for X against the B levels, for each of the R individuals. The B 
levels are now coded as vs. For the computation of the required 2rX RC 
in number the tedium of picking in turn each of the B appropriate sc’ores 
from the blocks may be avoided by first rearranging the entire table so as 
to have the B scores for each person under the cth condition in a row. That 
is, the original block and column designations are interchanged; if for 
examp e, t e original blocks and columns stood for distances and illumi¬ 
nation levels, respectively, blocks and columns in the rearranged table 
would stand for illumination levels and distances, respectively. Regardless 
of the arrangement forcomputing the SnX, once we have them they are the 

before beC ° me ^ “ SC ° reS ” ^ a tW °- way a " alysis of variance, as 


HYPOTHESES ABOUT CURVATURES 

The previously given test for curvilinearity merely permits us to accept or 
reject the hypothesis that the fit of the observations (the means) to a straight 
line is sufficiently close to be regarded either as within the limits of chance 
or as showing discrepancies too large to be attributed to chance. Rejection 
of the linearity hypothesis implies nothing about the possible form of the 
curving relationship. Now the form of a relationship may at times be 
predicted from theory, thus permitting us to go beyond the general 
statement that Y depends on X, or Y =/(X), to an equation involving a 
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specified (predicted) form for the relationship, such as 7 '-B log X + A 
or y' =- Ae BX or Y' = X\{A + BX), and so on. Or on the basis of n plot 
of Y (or the Y means) against X we may proceed empirically. With know 

ledge concerning the shapes of various mathematica 1 curves we s e 1 1 j e 

form of the curve that might fit the observations. Whether the form i 
arrived at from theory or empirically, we determine the numerical values 
for the constants called for in the mathematical equation of th ^ 
form Since the general problem of curve fitting is far beyond the scope 
of this book, thetuthor refers the reader to the excellent dlsc ™ ° f 
topic in Don Lewis’ Quantitative Methods in Psychology (New York. 

M 2rewf shall5 be concerned with going a step beyond the question 
of whether a significant departure from linearity is haphazard or shows 
sufficient regularity to suggest that some type of systematic c “ rvat ^ re 1 
present. This need not be empirical in that theory might predict that 
relationship involves an increasing, leveling off, then decking funct o„ 
(or a decrease, leveling, then increase) or a rapid rise fol owed by a level mg 
off The theory might not be sufficiently well developed to pernm 
prediction of a more specific form for the relationship, particularly for 
parts of the curve beyond (above or below) certain chosen levels for 
independent variable. In other words, we may merely predict that a 
segment (that for the chosen levels of X) of the relationship between 7 
and X should show curvature. 

The argument in favor of proceeding to a curvature component of a 
trend is similar to that given earlier (p. 350) for going from ‘he ordinary 
F test for between levels to the testing of the significance of possible 
linear trend. We may here use the earlier jllustration iniwhichi weF“ d 
that the means for the dependent variable were 19, 26, 16, 23 21 tor 
consecutive levels on the independent variable in contrast to 16, 19, 2 , 23 
26 for successive levels. Although the identical 7s for between levels 
might fail to reach significance, a significant effect might be claimed 
the second set via linear trend. Now suppose the means are 16, U, 26, 23, 
21 for the successive levels. A plot of these will show apparent systematic 
curvature, but very little linear trend (near zero slope). Would such an 
observed curvature prove to be a nonchance affair i teste y a me 
that gives some consideration to the systematic curving trend ? If so, could 
X be claimed as having a significant effect on 7? 

As a first approximation, we may regard a segment of a quadratic curve 
defined by the equation Y’= A + BX + CX\ as fitting (maybe) the 
segment of the “curve” based on available data. The quadratic component 
resides in the CY* term, so in effect we have the question of whether C 
differs from zero. It must be understood that rarely in psychological 





PSYCHOLOGICAL STATISTICS 

research will we find a logical reason for predicting a quadratic form of 
lelationship between a dependent variable and an independent variable 
over a wide range of values for the latter. The quadratic form is here used 
merely as a basis for testing the hypothesis that some curvature exists 
which, if taken into account, would explain a significant portion of the 


Table 17.3. Coded values, u, for quadratic component of trends for 3 to 10 levels 
on an independent variable 


Level 




1 

2 

3 

u 




+ 1 

-2 

+ 1 

Level 



1 

2 

3 

4 

u 



+ 1 

-1 

-1 

+ 1 

Level 



1 

2 

3 

4 5 

u 



+2 

-1 

-2 

-1 +2 

Level 


1 

2 

3 

4 

5 6 

u 


+5 

-1 

-4 

-4 

-1 +5 

Level 


1 

2 

3 

4 

5 6 7 

u 


+5 

0 - 

-3 - 

-4 - 

-3 0 +5 

Level 

1 

2 

3 

4 

5 

6 7 8 

u 

+7 

+ 1 

-3 

-5 

-5 

-3 +1 +7 

Level 

1 

2 

3 

4 

5 

6 7 8 g 

u 

+28 

+7 - 

-8 -17 -20 -17 -8 +7 +28 

Level 

1 2 

3 

4 

5 

6 

7 8 9 10 

it 

+6 +2 

-1 

-3 

-4 

-4 

~3 —1 +2 +6 


between means (for levels) variance. Since the method does take into 
consideration the apparent systematic curvature, an effect is more apt to be 
detected than by the ordinary F test for between levels. 

When the levels on the independent variable are equally spaced with 
an equal number of observations per level, there is a relatively simple way 
for testing the quadratic component of the trend. The method involves the 
use of a type of coded score, which we symbolize as a. The set of as to be 
employed depends on the number of levels, and possesses the property that 
2.U = 0. Table 17.3 gives the values of a for 3 up to 10 levels. Once the as 
appropriate for G (or C ) levels have been arranged alongside the sum of the 
dependent variable scores for the successive levels, we simply multiply 
the S Y for the gth group (or level) by u g , thus finding G products, which 
are then summed over groups to obtain S«„2 Y„, or Sa Y. (Substitute Xfor 

Y X has been used to designate the dependent variable.) The sum of 
squares for the quadratic component is given by (2aT) 2 /m2a 2 . Note the 
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similarity to the earlier given sum of squares for the linear component— 
the only difference is in the coded values used. This new sum of squares 
has df== 1; this df has to do with the one constant C in 7 ' = A + BX 
+ CX 2 which controls the curvature, just as the linear component was 
concerned with the one constant B in 7' = A + BX. With df =1, the 
value of (2w 7) 2 /m2w 2 is automatically a variance estimate, which we will 
symbolize by s 2 q . 

To test s 2 a for significance, we need an error term appropriate to the 
situation. If the observations are independent from level to level, the 
error term is the ordinary s 2 w of one-way analysis of variance. If the 
observations are correlated (same persons measured at each level), we have 
a two-way analysis of variance layout with C columns for C levels and R 
rows for persons (m — R for foregoing indicated computations), and the 
error term is s 2 rc . 

No attempt will be made here to explain the derivation of the sets of 
us in Table 17.3. Aside from the property that 2w = 0 always, an examina¬ 
tion of any one set may help us understand more fully their use in testing 
for a curvature component of trend. If the reader will plot any one set 
of us, say for G = 5, against any imagined five equally spaced values for a 
quantitative factor, X, he will have five points that follow a curve. Now 
suppose the 2 Y values for all five levels are identical—a plot of the five corre¬ 
sponding Y means against the five X values will, of course, be a horizontal 
line. With 2 7a constant from level to level, the linear component, using 
the earlier defined vs (—2, —1, 0, +1, +2), will give 2i?7 = 2r 3 2 Y g 

= EY g %v g = 0 since Hv g = 0. The linear component is zero as it should 

be for a zero slope. When we proceed to compute 2 uY (equivalent to 
2w ff 2 Yf) with us of +2, —■ 1, —2, —1, +2 (see Table 17.3), we have 

2 7„2w ( „ which is also zero because 2 u Q — 0. No curvature when all five 

Y means are identical! Now, just as the departure of 2 vY from zero 
indicates the presence of a linear component in the trend, the departure 
of 2 u Y from zero indicates a quadratic component. If the five means were 
such that the successive 2 7 values were 10, 20, 30, 40, 50, we would have 
2 1 / 7 = 2(10) - 1(20) - 2(30) - 1(40) + 2(50) = 0. An obvious linear 
component, but no curvature. (In this last example there was no constant 
2 7 which could be taken from under the indicated summation over 
groups.) 

Let us again consider the three sets of five means mentioned earlier 
(p. 357): 

Set A 19 26 16 23 21 

Set B 16 19 21 23 26 

Set C 16 19 26 23 21 
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With each mean based on m cases, the required £ Y for group g will be 
m Y (J ; hence the Hv Y for a set will be YLvm Y — mYlv Y and the Hu Y for a set 
would be Hum Y = mHu Y. 

For Set A we have for the linear component 

XvY = m[-2(19) - 1(26) + 0(16) + 1(23) + 2(21)] = +1 (m) 
and for the quadratic component we have 

ZuY = m[+2(19) - 1(26) - 2(16) - 1(23) + 2(21)] = -1 (m) 

from which we see that both the linear and the quadratic components are 
perhaps negligible (“perhaps” because no significance test has been 
applied). 

For Set B we have 

hvY = m[-2(16) - 1(19) + 0(21) + 1(23) + 2(26)] = 24 (m) 

and 

XuY = m[+ 2(16) - 1(19) - 2(21) - 1(23) + 2(26)] = 0(m) 

for which we see a sizable linear component with no quadratic component. 
For Set C we have 

ZvY = m[-2(16) - 1(19) + 0(26) + 1(23) + 2(21)] - 14(m) 

and 

ZuY = m[+2(16) - 1(19) - 2(26) - 1(23) + 2(21)] = -20(m) 

which indicates the possible presence of both components. 

For sake of illustration, let us suppose that s 2 w = l\ and that m = 10. 
The between-groups sum of squares becomes 580, which with df of 4 leads 
to an s 2 b of 145. The simple Ftest for the differences among the five means 
yields F — s 2 b js 2 w = 145/71 = 2.04 which for 4 and 45 degrees of freedom 
does not quite reach the .10 level of significance. This F holds, of course, 
for all three sets. 

For Set A the linear component sum of squares, (£y Yf/ml^v 2 = 100/100 
= 1; hence s 2 , p — 1, and F — 1/71, which is far from significant. The 
quadratic component sum of squares, (£wT) 2 /m£w 2 = 100/140 leading to 
s 2 q of .7, thence F — .7/71, which is likewise insignificant. 

For Set B in which there appears to be a definite linear trend, we have 
the sum of squares for the linear component, (HvYf/mXv 2 = 57,600/100 
= 576, which yields F — 576/71 = 8.11, which is significant beyond the 
.001 level. The quadratic sum of squares is zero. 

For Set C the sum of squares for the linear component is 19,600/100, 
leading to F — 196/71 = 2.76, which for dfs of 1 and 45 does not reach the 
.10 level of significance. For the quadratic component sum of squares 
we have 40,000/140 = 285.71, from which we get F = 285.71/71 = 4.02 
which is significant at the .05 level. 
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Admittedly, the foregoing starting means, with ms of 10 and s 2 w of 71, 
were contrived for a purpose: to show that the ordinary F (s 2 Js 2 w or 
s 2 Js 2 rc ) test is not sensitive to possible systematic trends and that the 
sensitivity of the statistical analysis for an effect can be improved by a 
method that takes into consideration the systematic trend shown by the 
data. This bonus in sensitivity is particularly deserved by the experi¬ 
menter who has made an a priori prediction either that the trend will 
involve a linear component or that it will have simple curvature (or both). 

The foregoing methods of analysis can be extended in two directions. 
(1) Group differences in quadratic components can be tested in all those 
types of setups for which we have discussed tests of differences in linear 
components. Space does not permit the presentation of these seldom- 
needed extensions. (2) The between-groups (levels) sum of squares can be 
broken down into additional components—cubic, quartic, quintic, etc.— 
each with df of 1. Since these polynomial forms of relationship are scarce 
in the empirical data of psychology and are even more scarce in the minds 
of psychological theorists, there would seem to be no good reason for 
going beyond the second degree polynomial (the quadratic) in this business 
of extracting components, with its implication that lawful relationships 
among psychological variables somehow involve a cubic or higher order 
polynomial curve. Admittedly, the quadratic form of relationship may 
rarely hold for psychological variables, but the testing of the quadratic 
component does provide us with a more sensitive statistical test of an 
effect than is possible by F = s 2 b ls 2 w when systematic curvature is present' 
and/or has been predicted. 



Chapter 18 

ANALYSIS OF VARIANCE: 
COVARIANCE METHOD 


It is usually possible in experimentation to choose, either by random 
methods or by pairing or matching, groups that are comparable on 
variables judged relevant to the comparisons to be made. There are times, 
however, when it is more practicable to use intact groups which may differ 
in important respects, and occasionally we may wish to make an un¬ 
anticipated comparison which does not seem justifiable in light of known 
differences between groups. Experimental control is the ideal, but, if this 
cannot be attained, we may resort to statistical allowances and thereby 
arrive at valid conclusions. 

Suppose that two intact groups are being used to evaluate the relative 
merits of two methods of memorizing and that the mean IQ is 105 for 
group A and 111 for group B. Now, if there is an appreciable correlation 
between the particular memorizing ability involved and intelligence, the 
results will need qualifying because of the difference in intelligence of the 
two groups. It would seem logical to use the regression equation, for 
estimating memory score from intelligence, as a basis for predicting how 
much of a difference in memorizing would arise because of the group 
difference in IQs. Let us suppose that the mean memory performance is 
60 for group A and 70 for group B, and that substituting 105 and 111 in the 
regression equation yields a predicted value of 62 for group A and of 68 
for group B. Thus our prediction would lead us to expect a difference of 6 
points, and accordingly it would be said that 6 of the obtained difference 
of 10 could be attributed to lack of comparability of the two groups with 
respect to intelligence. 
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The next question concerns the proper sampling error to use in evaluat¬ 
ing the adjusted difference. It should be obvious that the ordinary 
procedure is inapplicable for the simple reason that we have tampered with 
the obtained means and in so doing have interfered somewhat with the 
operation of chance. 

It is the purpose of this chapter to give a precise method for making 
allowance for an uncontrolled variable and to set forth the sampling error 
adjustment which is needed in testing the statistical significance of the 
difference between “corrected” means. The method is applicable whenever 
it seems desirable to correct a difference on a dependent variable for a 
known difference on another variable which for some reason could not 
be controlled by matching or by random sampling procedures. Since the 
scheme about to be proposed has an analysis of variance setting, the reader 
can readily guess that it will provide an adjustment for, and a test of 
significance of, the differences between two or more groups, and that it will 
be usable for either large or small samples. It is assumed that the dependent 
variable has a distribution which does not.depart too far from the normal 
type and that the variances from group to group are similar. 

In order to present the required adjustments, we need first to consider 
covariance, which is defined as Hlxy/M or X(X —- X)(Y — Y)/N. The sum 
of products of deviations can be broken down into components in a manner 
similar to that used with a sum of squares. In the simplest situation we can 
have m pairs of X and Y scores in each of G groups. These pairs of 
scores can be recorded in some such fashion as that depicted in Table 18.1. 


Table 18.1. Schema of scores for covariance 
Group 


1 


2 


* 


G 

*11 

Y n 

*12 

*12 


Y l0 

Xw Yig 

*21 

*21 

*22 

*22 

*2, 

*2, 

*2 G *2 0 

*<1 

Yn 

*<2 

*Z2 

XiO 

Y ig 

x iG r iG 

X m i 

Y m i 

*m2 

*w2 

X ma 

Y 

A mg 

X mG Y mQ 


Note that X ig and Y ig stand for the X and Y values of the ith individual 
in the gth group. Note also that in allowing i to take on values running 
from 1 to m we do not imply any order for the individual, and that the 
fth individual in one group is in no sense paired with the zth case in another 
group. The product of the deviation scores for the zth individual in the gth 
group would be (X ig - X){Y ig - F), in which Xand Fare the means for 
all mG cases. The total sum of products would be - X)( Y i(J — Y), 
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Now each deviation can be expressed in terms of two components in 
exactly the same way as in Chapter 15; i.e., one part is the deviation of the 
score from the mean of the group to which it belongs, and the other part 
is the deviation of the group mean from the total mean. Thus we have 

(Xia — X) — ( X ig — Xg) + {Xg — X) 
and _ 

(Yi,~ Y)=(Y ig - Y g ) + (Y g - F) 

Then the foregoing sum of the products becomes 

ES[(X„ - X„) + (X g - X)J(Y ig - ?') + ( Y g - F)] 

When the bracketed expressions are multiplied together, four terms result, 
and, since two of these vanish, we have left that the total sum of products is 
equal to __ _ 

^nx ig - X g ){Y ig - Yg) + — X)(Y g - F) 

i a g 

The first of these terms involves a within-g roups sum of products, whereas 
the second is for between groups. If there happens to be an unequal num¬ 
ber of cases per group, the m of the second term goes under the summation 
sign as m g . The degrees of freedom for the total sum of products is 
mG — I, or TV — 1, where N is the sum of the m g s; the dfs for the within 
and between terms are mG — G {ox N — G) and G — 1 respectively. 

It will be of convenience to assemble in a table the sums of products, 
along with the sums of squares, for both the X and Y variables. These will 
be found in the first three lines of Table 18.2. 

Although we are here presenting the covariance technique as a method 
for making such adjustments as discussed in introducing this chapter, it is 
of interest to link covariance with the problem of correlation. The product 
moment correlation coefficient is usually defined as 


which may be written as 



r = ■— 

N;SX 

Y*xy = £(X - JP)(y- F) 

VsWsy Vz(AT - F) 2 Vs(y- Yf 


or as a function of a sum of products and two sums of squares. Using the 
sums of Table 18.2, we may specify three correlations: one based on the 
total sums, one based on the within sums, and one based on the between 
sums. These three correlations are indicated in line 5 by letters A, B, and 
C, with appropriate subscripts used to designate the several sums in the first 
three lines of the table. Line 5 a gives theirs for the rs. 
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Note that the between-groups r is actually the correlation between the X 
means and the 7 means for the groups. If this r is significant, it follows 
that one source of the correlation for the total group is the heterogeneity 
resulting from the throwing together of groups with unlike means. (This 
between-groups correlation is meaningless when only two groups are 
involved. Why?) Stated differently, an appreciable between-groups r 
indicates that the total r is spurious; this spuriousness is eliminated when 
r is computed from the within sums. The similarity of the within-groups r 
to the partial correlation coefficient will be recognized by the discerning 
student, especially if he recalls the derivation of the latter. 

We now turn to the use of covariance as a basis for allowing for the 
influence of an uncontrolled variable on the differences between group 
means. The question here is not what the result would be if the uncon¬ 
trolled variable were held constant, as in partial correlation, but rather 
what the result would be if the groups were made comparable with respect 
to the uncontrolled variable. Let X represent the dependent variable, and 
Y the uncontrolled variable. It is presumed that the 7, values differ, and 
that X is correlated with Tin a linear fashion. For purposes of exposition 
we shall refer to Table 18.2, which will serve as an outline of the required 
computations. Line 6 of this table gives the regression coefficients (b xy ) for 
predicting X from T. Since no use will be made of A b /C b , it is bracketed; 
it need not be computed. 

That these A/C values are regression coefficients can readily be demon¬ 
strated. In Chapter 9 the regression of A on T was given as 



Since, as we have seen previously, 

r = ■ Jf V -= - > s » = and S y = Xy 2 /N 

we have 

_ ^~* x y . v^ 2 /-^ 

_ Yjxy _ A 

In order to make allowance for the uncontrolled differences in Y ff) we 
need not only to adjust the X g values but also to make an adjustment to the 
error term, which is used as the denominator of the F ratio in testing the 
difference between the adjusted X means. As in the simpler situation of 
Chapter 15, Twill involve the ratio of between-groups to a within-groups 
variance estimate. 
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First let us consider the method of making the adjustment t0 the 
and to the within-groups variance estimates. The problem here is that 
1c fying hoi much of the variation in X can be predicted from variation 
in Fantf then of subtracting this to secure the left-over variation as an 
adjusted value. But this left-over variance is nothing more than t 
residual variance, or square of the standard error of estimate, obtainable 

from formula (9.6): ^ = ^ ^ 

Actually the adjustment is to be made to the sum of squares In order te¬ 
state the residual variance in terms of sums, we may substitute for S * an r . 

Thus, 


S 2 „ = 






S ® 2 


N (S* 2 )(^ 2 ) N 


hence, 




Since NS 2 always equals a sum of squares, the value of NS 2 „ is obviously 
Se sum of squaresior the residuals. In the notation of this chapter, 

[ssw. - - y )] 2 

ns 2 „, = - X) 2 - '' 

i 9 

would be the residual sum of squares after the regression adjustment. This 
sum can be written as 


NS 2 


B 

B ‘ c, 


which is the entry for the total group in line 7 of Table 18.2. Similarly 
the corresponding residual, or adjusted, sum of squares for within groups 

15 At fcUhiught it would seem logical to adjust B„ by the use of A and 
an J C but the between-groups correlation (and regression) is affected by 
the differences between the X means, which are the differences to be 
adjusted and then tested for statistical significance. Our adjustment shou 
begone which is independent of the differences to be tested. This suggests 
that the regression for within groups, or AJC W , shou e use* sin ^ 
regression for the total is also affected by the difference which we are out to 
test Insofar as we are concerned solely with the adjustment off he between- 
groups X means, the best adjustment would be by means of the w th n- 
groups regression. This could take the form of either an adjustment to 
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| t Xf° UPS SUm ° f Squares for Z ° r a direct ad j u "‘ to the several 

an'tfferTth 1 ^ W ° U ' d ^ th ® b ® St Way of ascertain >ng how much of 
an effect the noncomparability of the groups with respect to Y had on the 

means, there is another consideration as to whether the within regression 

is appropriate for adjusting the between-groups sum of squares. 8 It will 

be recaUed that * is to be taken as the ratio of a variance'estimate teld 

on the between sum of squares to that based on within groups, and that 

mate N arian f e est ' mates /eing so compared must be independent esti¬ 
mates. Now if we adjust both the within and the between sum of squares 
by means of the same regression coefficient (say, that based on within 
groups), any sampling error in this regression coefficient would have a 
similar effect on both adjustments; hence it could not be argued that the 

resul ,ng adjusted sums of squares ^ ^ “ 

Therefore variance estimates based thereon would not be strictly indepen- 

This difficulty is overcome by taking the adjusted sum of squares for 

adjusted S aS the f dlfference between the adjusted total sum and the 
adjusted within sum of squares. Thus, for the purpose of testing signifi- 




leads to the proper adjustment for the between sum of squares for X. 

tw! PSthe , reader has antici P ate d that the dfa may change as a result of 
these manipulations. The new dfi are recorded in line 8 of Table 18 2 

Note ^ the df f0T the between sum has not changed s . nce the ad ^2. 

was not made by using the between-groups regression. 

format/?™ thC USUai methods for ^ulating sums of squares, we need 
mulas for computing sums of products in terms of raw scores. The 

InnZw f ? rmuks , are written for “"equal «. values, but are of course 
applicable for equal ms. 


sy*,., - X)(Y is - T) = 2SX,. 


,K_ - 


SZXt.SLYt, 

i g i a 

N 


for total (18.1) 




__ X)(Y — V 

g)\ Y ig y g ) — T*2*X ig Y ig — 2 - 

i 9 g m 


- for within (18.2) 


2^24 2S 


2^ m,(X g - X){ Y a - Y) = S a-:_ ±j_ to 


m. 


N 


for between 


(18.3) 
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Thus to compute the sums of products of deviations, we need the sum of 

all N raw score products or ZZX ia Y ig , the sum of all the As or 

the sum of all the Fs or 22 fJ, the sum of the As separately for each group 

or %X ig , and the sum of the Fs for each separate group or 2F i£7 . Adding 

% 

Table 18,3. Score data and sums based on raw scores for analysis of variance by 

covariance adjustments 


Group 



1 

2 

3 



F 

X 

F 

X 

F 

X 



14 

10 

11 

5 

7 

5 

SSF = 173 


9 

6 

9 

2 

6 

4 

£2 Y = 268 


11 

8 

8 

6 

2 

1 



12 

6 

10 

5 

10 

7 

SSF 2 = 1161 


10 

9 

10 

4 

7 

9 

SSF 2 = 2642 


11 

7 

10 

8 

7 

4 



11 

9 

12 

10 

6 

5 

ssxr = 1688 


8 

5 

9 

6 

3 

2 



11 

6 

10 

4 

2 

2 

£(2X) 2 = 10,401 


12 

7 

11 

6 

9 

5 

£(£ Y ) 2 = 25,362 

Sum 

109 

73 

100 

56 

59 

44 

X = 5.77 

Mean 

10.9 

7.3 

10.0 

5.6 

5.9 

4.4 

Y = 8.93 

£F 2 or 









1213 

557 

1012 

358 

417 

246 


HXY 

810 

571 


307 




the several X sums gives the sum of all the Xs; likewise for Fs. Note that 
to get the second term of (18.2), or the first term of (18.3), we must divide 
the product of the two sums for a group by its in and then sum such 
quotients over all G groups. The reader may find some interest in com¬ 
paring formulas (18.1-18.3) with formulas (15.9-15.11) and it should be 
apparent that in the case of equal ms formulas (18.1-18.3) can be written 

in the simpler way of formulas (15.6-15.8). . . 

The required computations are illustrated by using the data (fictitious) 
of Table 18.3, which contains F and X scores for ten cases in each of 
three groups. The scores in each of the six columns are separately summed 
to yield 109, 73, etc. The scores are squared and summed to yield 1213, 
557, etc. Summing the products of the X and F values gives 810, 571, and 
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307 for the three groups. Summing over groups yields the double summa¬ 
tions 173, 268, etc. Certain of these sums are then substituted into 
formulas (18.1-18.3) to secure the total, within, and between sums of 
products of deviations. By substituting the proper sums into formulas 
(15.6-15.8), we get the required sums of squares for the 2s and for the 7s. 
Then these three sets of sums are entered as the first three rows of Table 
18.4, which follows the pattern set forth in Table 18.2. 


Table 18.4. Analysis of variance for X variable of Table 18.3 by covariance 
adjustments for uncontrolled Y 



Total 

Within 

Between 

1. Sum of products 

142.53 

72.70 

69.83 

2. Sum of squares: X 

163.37 

120.90 

42.47 

3. Sum of squares: 7 

247.87 

105.80 

142.07 

4. df 

29 

27 

2 

5. Correlation 

.709 

.643 

.912 

5a. df for r 

28 

26 

1 

6. b xy value 

.5750 

.6871 


7. Adjusted hx 2 

81.42 minus 

70.95 equals 

10.47 

8. df 

28 

26 

2 


Before proceeding to the covariance adjustment, let us consider the 
means given in Table 18.3. It will be noticed that the groups differ 
considerably on X, or the dependent variable, and that they also differ on 
7, the relevant but not controlled variable. An analysis of variance based 
on the sum of squares for the 2s leads to a between-groups variance 
estimate of 42.47/2, or 21.26, and a within-groups estimate of 120.90/27, or 
4.48. The F for testing the significance of the between-groups variance 
becomes 21.26/4.48, or 4.75, which for the given dfi is significant at about 
the .02 or .03 level of significance. This analysis does not, of course, allow 
for the fact that the groups differ on 7. If there is correlation between X 
and 7, the observed differences on X may be mainly a reflection of the 
group differences on 7. As previously stated, the purpose of the covari¬ 
ance adjustment is to make statistical allowance for such uncontrolled 
differences. 

By following the steps indicated in Table 18.2, we determine the values 
in lines 5 to 7 of Table 18.4. Note that the adjusted 'Ex 2, for between groups, 
10.47, is secured by subtracting 70.95 from 81.42. The analysis of variance 
based on the adjusted sums of squares (for the As) gives a between-groups 
variance estimate of 10.47/2, or 5.23, and a within-groups estimate of 
70.95/26, or 2.73. Then F = 5.23/2.73 = 1.92, which for 2 and 26 degrees 
of freedom yields a P of about .20. Accordingly, it cannot be concluded 







371 


[18] ANALYSIS OF VARIANCE: COVARIANCE METHOD 

that there are significant group differences on X over and above those which 
would be expected because of the differences on Y. 

It should be obvious that the use of the covariance adjustment method 
must be justified by logical and experimental considerations. When it is 
logical to control a variable by pairing or matching, the covariance 
adjustment is defensible as a way of making proper allowance for a failure, 
because of in feasibility, to control the variable. The use of the covariance 
adjustment is not predicated on the degree of correlation between the 
dependent and the uncontrolled variable. If the correlation is relatively 
low, the adjusted values will differ but little from the unadjusted values, 
if high, both the total and within adjusted variances will differ considerably 
from the unadjusted variances, but, as we shall presently see, the extent to 
which the adjusted and unadjusted betwetn-groups variances differ is not 
solely a function of the correlation. 

It is of interest to make an actual adjustment of the X means of Table 
18.3 for the group differences on Y. The adjustments can be made by 

^ - wn “ F ) 

in which X m is the adjusted value for the gth group, and b, , is the within- 
groups regression coefficient. For the data of Table 18.3 we have 

% la = 7.30 - .687(10.90 - 8.93) = 5.95 

J 2a ^ 5.60 _ .687(10.00 - 8.93) = 4.86 

X Ra = 4.40 - .687(5.90 - 8.93) = 6.48 

Should the reader be surprised that the adjustment puts group three 
ahead, he should ponder the fact that, relative to the w/f/»w-groups X and Y 
variances, the third group’s X of 4.40 was not as far below the means of the 

other two groups as was its Y of 5.90. 

From a careful consideration of the foregoing, it will be seen that the 
covariance adjustment method will not necessarily reduce the differences 
between the means on the dependent variable. Situations arise in which 
groups that show marked differences on some correlated but uncontrolled 
variable may yield similar means on the variable being studied. Suppose 
that we are using two intact groups to investigate the relative merits of 
two learning methods, and that the initial means of the two groups are 
markedly different. We would, accordingly, expect a difference on final 
standing even though the two methods were equally efficacious. If this 
expected difference is not found, it follows that the method used by the 
group with the lower initial score was more effective in that this group 
overtook the other group. With groups differing on an uncontrolled 
variable, it is not only as proper, but also as necessary, to use the covariance 
technique when the groups are nearly the same on the dependent variable 
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as when they are different. For such situations the adjustment will increase 
the between-groups variance. The adjusted variances are sometimes 
referred to as “reduced” variances, but it follows from the foregoing that 
this term may be a misnomer for the adjusted he/imw-groups variance. 

The extent to which the adjusted variances lead to a level of significance 
different from that based on an analysis of the unadjusted values will 
obviously depend on three things: the degree of correlation between the 
dependent and uncontrolled variable, the size of the differences between 
the groups on the uncontrolled variable, and the found differences on the 
dependent variable. The applicability of the covariance technique does 
not depend on a minimum degree of correlation or on a definite amount 
ot group differences on the uncontrolled variable. But, if the within- 
groups correlation is low and/or there is only a small, chance difference 
between the groups on the uncontrolled variable, the use of the covariance 
a justment may not be worth the effort. Obviously, if a variable correlates 
near zero with the dependent variable, it need not be controlled experi¬ 
mentally or statistically. 

The covariance method can be extended to make adjustments for group 
differences on more than one uncontrolled variable. This involves the use 
ol multiple regression, but computationally it is perhaps simpler to handle 
the adjustments m terms of multiple rs. We need two multiple correlation 
coefficients, one obtained by way of correlations based on within-groups 
sums of squares and of products, and the other by way of correlations based 
on total sums of squares and of products. 

If, for example, allowance is to be made for three uncontrolled variables, 
If, y 2 , and r 3 , we will need six (one for each pair of variables —X is the 
fourth or dependent variable) auxiliary tables consisting of entries like 

mm 6 'iVoT ’’ l\ and 3 U " der the “ t0tal ” and the “within” columns of 
Table 18.2 (or Table 18.4). We can then calculate two sets of intercorrela- 

tions (each auxiliary table will lead to two rs when the substitutions called 
for in line 5 of Table 18.2 are made) among the four variables, and from 
these we compute, by the methods set forth in Chapter 11 two r 2 
values. Let us designate the multiple based on the total sums as l Ind 
that based on the within sums as R w . 

With these two multiple rs available, we may rewrite line 7 of Table 18 2 
as 

B t {\ - R* t ) minus B w ( 1 - R*J equals adjusted B b 
with respective dfi of 


for the n variable problem (one dependent, plus the number of uncontrolled 
variables included in the adjustments). 
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Remark ■ The use of the covariance adjustment technique is far superior 
to attempts at pairing individuals from the intact groups on the basis ot 
one or more uncontrolled variables, a procedure which ■nevitab'y leads to a 
reduction of sample size and also runs astride a regression difficulty (see 

^ Evaluation of changes. In Chapter 6 we discussed the usually advocated 
method for comparing changes shown by experimental and control 
groups (applicable 5 also for two experimental groups). We have withi and 
/standing for the pretest and posttest measures and E and C standi g 
experimental and control groups, 


D = D 


D c = (X /E - X iJS ) - (X f c ~ x ic) 


as the net change, the change shown by the experimentals corrected for 
that shown by the controls. Wemay rearrange the Jfs, yet maintain the 
numerical value of D = D E — D c , as follows: 

D = (XfE •” Xf6) ~~ (XjE ~~ Xic) 

from which it is seen that the net change may also be thought of as the 
final difference between the two groups corrected for their initial difference. 
Such a correction involves the assumption that each unit of difference i 
initial standing will produce a unit of difference in final standing. In other 
words, this type of adjustment implies a 1-to-l relationship between mu 
and final scores. Since a perfect correlation is never found or approached 
in practice, one may question whether the usual procedure of comparing 

^It'fs^of couree^entirely logical that group differences on final sc o r ® 5 ’ 
which we may here call the dependent variable, shouf be T he 
group differences on initial standing as an uncontrolled variable. The 
covariance adjustment technique provides a way of correcting final means 
for initial differences, with due allowance for the degree of correlation 
between initial and final scores. The ordinary and the covariance method 
differ not only in the correction but also in the resultant sampling error. 
The ordinary technique uses a standard error which definitely includes, 
either explicitly or implicitly, the variance for both initial and scor ® 
and the correlation of initial with final, whereas the error term used m the 
covariance method is a direct function of the degree of correlation and of 
the variance for the final scores only. In other words, the net differences 
being tested are not the same, and neither are the error tcrms the same 
The covariance method will, in general, be more sensitive. The stude 
should read Professor R. A. Fisher’s discussion on this point. 

* Chapter IX in Fisher, R. A., Design of experiments, London: Oliver and Boyd. 
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Tlie Ft, and * techniques are sometimes referred to as parametric 
because they mvolve the estimation of at least one parameter (JopuTatlon 

for^ Vhet’^ P°- U f VarianCe eSt ™ ate is needed as the error term 
for F. These techniques also involve, at the derivation stage an assumption 

hi this 0 bH a f h tn i bUtl0n ( ° f S ° me Variate) ' The techni q u «s to be presented 
in this bn.f chapter are sometimes called nonparametric because they do 

o lsd V °t h i eS ‘ lmati0n of Parameters and they are sometimes referred 

because no assumptions are made about the 

m ett V0C i? teS f non P arametric tests usually do not stress their nonpara- 
me nc character as much as their distribution-free property. It is saTd 

So iT' fr ® e . methods sh °uld be used because the assumption 
Hpht ^ " WhlC P arametnc tests are based, may not hold. But in 
light of Norton s study (p. 252) and Boneau’s results (p. 106) the worry 
about violating this assumption seems ill-founded y 

frotThJT. a ?T nt adVanC6d “ faV ° r ° f nonparametric methods springs 

levels are refeL“® aSUrement ,^achieved by measuring instruments. These 
levels are referred to as nominal, ordinal, interval, and ratio scales of 

measurement. The so-called nominal scale is not a scale at all since it 
involves nothing more than classifying individuals into categories that 
are quahtativeiy different, with no ordering implied. Numbed may be 
or co mg the categories, and frequencies may be expressed as propor¬ 
tions or percentages. The ordinal scale connotes an ordering w th rank- 
order positrons us ually specified by numbers . An 

equa units can be claimed; e.g., when the interval 140-150 repre¬ 
sents exactly the same amount as the interval 110-120. Such is the case if 
we are measuring length m terms of inches or weighing objects in terms of 
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pounds. Without giving any reasons, we will make the dogmatic-sounding 
assertion that very little “measurement” in psychology involves equal 

units_the scales and tests provide a basis for ordering which may be 

regarded as much better than subjective rank-ordering. For a scale to be 
called a ratio scale it must have a true zero point, in addition to qualifying 
as an interval scale. 

Now it is claimed by some that the level of measurement definitely 
restricts possible statistical treatment of data. It is easy to see that adding 
numbers, used to code qualitatively different categories, in order to com¬ 
pute a mean is nonsensical. It is easy to see that a ratio scale is required 
for a meaningful coefficient of variation, (S/M). It is easy to see that rank 
positions as “scores” will lead to absurd standard deviations—if ten 
persons are ranked the ranks range from 1 to 10 whereas if twenty-five 
persons are ranked the rank scores range from 1 to 25—the amount of 
variation is a direct function of N, hence a a or an 5 is not descriptive of 
group variation. And it is easy to see that the adding of scores that do not 
qualify as being on an interval scale may make the mathematical purist 
dubious about the precise descriptive property of the mean, variance, and 
product moment r, all of which call for addition. 

The crucial question, however, is whether or not the F, t, and z tests 
can, in view of their dependence on means and variances, be safely used 
when the scale of measurement is, as is the rule in psychology, somewhere 
between the ordinal and the interval scales. The question boils down to 
this: Will Fs, % and zs follow their respective theoretical sampling 
distributions when the underlying scores are not on an interval scale ? The 
answer to this is a firm yes provided the score distributions do not markedly 
depart from the normal form. Nowhere in the derivations purporting to 
show that various ratios will have sampling distributions which follow 
either the F or the t or the normal distribution does one find any reference 
to a requirement of equal units. The attaining of an interval scale of 
measurement, though desirable for some reasons, will not alter the risks 
of type I and type II errors when statistical inferences are made. 

There is, of course, no denying the fact that the type of data available 
does dictate the type of statistical technique that can be used. We have 
already discussed methods for handling nominal data—either by the 
binomial or by % 2 , both of which may be called distribution-free methods 
because no assumptions are made about the distribution of the variable or 
variables underlying the categories. Data in the form of ranks may force 
one to use Spearman’s rho or Kendall’s tau or the tests to be presented 

later in this chapter. 

In general, distribution-free methods, when applied for comparative 
purposes to data which are normal or nearly normal, are not as sensitive 
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(that is, as powerful for avoiding type II errors) as the appropriate z, 
t, or F technique. Consequently, in using a nonparametric method as a 
short-cut, we are throwing away dollars in order to save pennies. 

The sign test. Perhaps the simplest of all distribution-free methods 
is the “sign” test, which is applicable for testing the difference between 
two correlated sets of scores. The procedure is to consider the N pairs of 
differences, X 2 , some of which will be plus, some minus (with an 
occasional zero). If there is no difference between the two sets of scores 
we would expect the plus and minus signs to be equally divided. To test 
whether there are more plus signs than reasonable on a chance basis, the 
binomial, (p + q) N with p = .50, is used (N is for the pair differences 
having a sign; it is the sample size less the number of zero differences) 
in the manner discussed earlier (pp. 46-48). For effective N larger than 
10 we may use either the normal curve approximation to the binomial 
(pp. 44-46) or the approximation (pp. 209-12). Whether we use the 
binomial itself or one of the approximations, we must take care to secure 
a P that represents whichever—a one-tailed or a two-tailed—test is 
appropriate for the hypothesis being tested. 

The “median” test. A procedure for testing the difference between 
two sets of independent scores is to use the median for the two groups 
combined as a basis for dichotomizing. This leads to a fourfold table: 
above vs. below (the median) on one axis, group vs. group on the other. 
Then the x 2 test for the fourfold table may be employed, with Yates’ 
correction if necessary. With very small Ns the exact probability method 
(pp. 236—39) would be used. The idea back of the median test is simply 
that two samples drawn from two populations having the same median 
should yield equal splits. In practice, difficulties are sometimes encountered 
in attempting to dichotomize exactly at the median. When the median 
is an integer and several scores are equal to the median, the dichotomy 
can be taken as those scores which exceed the median vs. those which do 
not exceed the median. 

Median test for more than two independent groups. This is a straight¬ 
forward extension of the median test to provide an over-all test of 
the differences between, say, C independently drawn groups. On the basis 
of the median of the distribution of the C groups combined, the scores 
are dichotomized (as near the median as possible). This will lead to a 
2 by C table from which one may obtain a x 2 with C - 1 degrees of 
freedom. 

Whether we are dealing with two groups or with C groups, the Ns for the 
groups need not be equal for use of the median test. 

Mean and variance for rank scores. Since the next four tests are 
based on ranks, we now digress to consider the mean and variance of the 
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distribution of ranks, a rectangular distribution running from 1 to JVwhen 
N persons have been ranked. Let X stand for any rank score. A little 
thought leads to the conclusion that X = {N + l)/2, hence ST — NX 
— jv( 1V + l)/2 as the sum of the rank scores. In some college algebra 
textbooks there is proof that the sum of the squares of the first N natural 
numbers is given by N(N + l)(2Af + l)/6. This gives us the value of ST . 
When the foregoing given values for and £A 2 are substituted into the 
general formula (3.6) for the sum of squares of deviations about the mean, 
we find after simplification that Sz 2 = (N 3 - N)l 12, hence the variance, 

S^ = (/V 2 - 1 ) - (19A) 

N 12 

It is no accident that 6, or i of 12, appears in the formula for rank- 
difference correlation, and that 12 appears in Sheppards’ correction to 5 
for the grouping error. Why in the latter? 

Mann-Whitney V test. This test, which is applicable only to results 
based on two independent groups, involves rank ordering the scores, for 
the two groups combined, from greatest (rank 1) to least (for which t e 
rank will be N = N x + N 2 unless there are ties for the bottom position). 
When ties occur, each person involved is assigned the average of the ranks 
that would be assigned in case the tied persons could be differentiated 
(see p. 203). Then the ranks so assigned are summed separately for each 
group. Let T x and T 2 represent these two sums. (As a check on the arith¬ 
metic, + T % should equal ^ > tlie sum of the ^ rst N natural 

numbers.) 

When both N t and jV 2 are 8 or greater, the statistic 


U i = ^\N 2 + 


+ 1 ) 




(19.2) 


is distributed normally about a chance expected value, or mean, given 
by J\yV 2 /2, and with variance of + N 2 + 1)/12. We then have 

x _ U, - NM 2 

a I NjNzjN-, + N 2 + 1) 

V 12 

as a unit normal deviate by which the significance of V as a deviation from 
the null hypothesis expected value is determined. If, as an alternate, we 
define U by replacing 7\ with T 2 and with N 2 (in the second term), we 
will have U 2 . Now U, and U 2 will deviate to the same extent, but in 
opposite directions, from NJXJ2. 

When C/j is larger than JV^/2, the direction of the difference between 




PSYCHOLOGICAL STATISTICS 

the two sets of scores is such that group 1 is superior to group 2. (If ranks 
are assigned with the least score as rank 1, and so on, the vakic of 17, will 
be smaller than 2 when group 1 is superior.) For N, and N less than 
8, special tables are required for judging the significance^ U 
Kruskal-WaUra one-way analysis of variance by ranks. This test is 
applicable for testing the difference between G independent groups with 
varying numbers, m „ of cases per group. All N (A= 2m ) scores are 
ranked wtth a rank of 1 assigned to the lowest score and a rank of N to 

Sn sum the rank ^ ° f the CaS6S iS maintained so that we 

T foTh! T SC0re f7 lthm each group, which sum we will designate as 
4 tor the gth group. Then the quantity 

= 3 (^ + D (i9.3) 

is computed. Under null conditions (no difference in averages for the 
popu anons , and for all m, greater than 5, the sampling distribution of 
H follows closely the distribution with df = G — 1 

?f ,^ t Cn , there are sets of ties > with U the number of cases tied in the 
vth set, it is necessary to apply a correction to H: 


1 - 2(t 3 s - /,) 

The corrected value will be higher than the uncorrected value and will 

therefore tend to help us reject the null hypothesis. If H is significant 
we would not bother to compute H c . sigmncant, 

Friedman two-way analysis of variance by ranks. This test is for the 
mixed model situation where the columns stand for C lenmen ta l 
conditions (or levels on a factor) and the rows stand either for R individuals 

peol!tirr'ando aSUred Under ““ Cc0nditions ’ or r °<- * sets of matched 
persons, with random assignment within a set to the C conditions The X 

score, „cl, „„ are assigned „„ k ,, , to c , tb> Ia „ t * 
^ »>"">■ r. ^ .he rX 

^ = RC(C + 1) F 2 “ ~ 3R{C + (19.4) 

is computed. The designation of this.as a chi square implies that in the 

STS"®“>*«• When C> 

R > 9 or C > 4 and R > 4, the random sampling values of -A tend to 
follow approximately the chi square distribution, with df = C - 1 
The rationale for the Friedman test is, briefly, as follows. The assigning 
of ranks withm rows as 1 to C reduces all row means to the same value 
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(C + l)/2, thus “taking out” row differences. (Recall that for the ordinary 
two-way Ftest a row sum of squares was extracted.) Under null conditions 
that the original X scores all belong to the same population, i.e., that there 
is no column effect, we would expect the rank scores, RC in number, to be 
distributed evenly over the columns in such a way that the distributions 
within columns would be the same. Therefore, under null conditions the 
column means for the rank scores would tend to be the same; (that is, 
we would expect equal T c ) and the within column variances would also 
tend to be the same. 

Next we note that with all the row means equal to (C + l)/2, the mean 
of means., or total mean, would also be (C + l)/2. When we consider the 
general expression for the sum of squares for the R x C interaction, 
SS(A r rc - X r . — X c + X.) 2 , we see that when X r . = X., this sum of 
squares reduces to ££(X rc - X c ) 2 , which is nothing more than the sum of 
squares within columns. But under null conditions the distribution of 
ranks within any one column will, of course, be rectangular (the rank 
scores will run from 1 to C within a column) with variance that can be 
specified theoretically as (C 2 - 1)/12. Strictly speaking the theoretically 
specified, under null conditions, distribution of ranks within all columns 
cannot be exactly the same unless R is a multiple of C. 

The x 2 r of the Friedman test is an F = s 2 Ja 2 rc , in which s 2 c has C — 1 
degrees of freedom and s 2 rc has been replaced by the theoretical population 
variance, or a variance with df — go. As usual, when n 2 = oo, F becomes 
a % 2 jn 1 , hence x \ is n i times an T ies of ranks within rows do not disturb 
the Friedman test, and it is claimed that the Friedman test agrees very 
closely with an F test applied to the original X scores. 

Kendall’s coefficient of concordance, W. Suppose C judges each rank 
order R individuals, and we wish a measure of the agreement among the 
judges. Arrange the rankings into a table of R rows and C columns,. The 
rank scores in the cth column will run from 1 to R (except when ties occur 
for either the top or bottom position—unlikely in practice). Sum across 
columns and enter the several sums along the right-hand margin. We might 
regard these sums as “scores” for the R individuals.... If there were perfect 
agreement among the judges, these sums would range from C to RC, with 
in between values of 2C, 3C, •••,(R — 1)C, though not necessarily in that 
order. Consider the variance of these sum scores. Since each sum is 
simply C times a rank, with the ranks running from 1 to R, the variance of 
these sum scores will be C 2 times the variance of the first R natural 
numbers, The maximum variance possible will be 


a 


2 


C 2 (R 2 - 1) 


max 


12 
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The same value can be obtained by considering the variance of the 
sums in terms of the variance theorem. The variance of the sum scores 
will be equal to the sum of the separate C variances, one for each column 
and all identical, i.e., all will equal ( R 2 — 1)/12, plus a sum of C(C — l)/2 
correlational terms of the form 2r 12 o' 1 cr 2 . Under the condition of perfect 
agreement among the judges, all of these correlational terms will become 
2(1 )ct 1 ct 2 , but since all columns have the same variance, cq will equal cr 2 
and each correlational term will become 2(1 )o 2 in which a 2 = ( R 2 — 1)/12. 
Summing C(C — l)/2 such terms yields C(C — l)cr 2 , which when added to 
the sum of the C variances (i.e., added to Ccr 2 ) gives 

c r 2 max = Co 2 + C(C - l)cr 2 = C 2 a 2 = C 2 (R 2 - 1)/12 

In practice, perfect agreement will rarely if ever occur. As a measure 
of the extent of agreement, Kendall proposed that the variance of the 
obtained sums be taken relative to the maximum possible variance. 
Accordingly, the coefficient of concordance is defined as 

S 2 

(19.5) 

G max 

in which 

RZT ! r — (xt;) 2 

r \ r / _ 



with T r as the total (or sum) for the rth row. Obviously, W can never be 
negative; it will be 1 when the agreement is perfect. The value of W tends 
to be higher than the average of all possible Spearman rank difference 
correlations between the judges. When ties occur, the denominator term 

C 

for W becomes o 2 max -- £(/ 3 s — t s ) in which t s , the number of cases 

12 R s 

tied in the sth set of ties, will take on values 2, 3, 4 • • • , and all the sets 
irrespective of their column location are included. 

For R > 7, W may be tested for significance by 


X 2 w = 


12 RS i 


CR(R + 1) 


(19.6) 


which follows approximately the x 2 distribution with R — 1 degrees of 
freedom. A significant W may be interpreted in two ways: either as 
indicative of better than chance agreement among the C judges or as a 
significant (reliable) difference among the R sums (or the R possible means) 
for the R individuals. But mere statistical significance may not be as 
crucial as the knowledge that W is fairly high. 

There is a direct connection between the Friedman test and the signifi¬ 
cance test for Kendall’s W ,, although as significance tests they differ as to 
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purpose. Friedman’s test is concerned with the significance of the differ¬ 
ences among the C column averages, the number of which is usually 
very small, whereas the test of IF is concerned with the significance among 
R row averages, the number of which is usually not very small. Friedman’s 
is, typically, used to test for the effect of a fixed constants factor, 
whereas typically x 2 »- tests for the significance of a random factor (individ¬ 
ual differences) although applicable to the ranking of C objects by judges. 
Kendall’s W provides a useful descriptive measure of agreement among 
judges, but such a measure is not a relevant part of the Friedman technique. 

It can be shown by simple algebra that the test for W can be written in 
the alternate form: 

r V =-—-ST 2 , - 3C(R + 1) (19.7) 

1 w CR(R + 1) r 

which bears a marked resemblance to the expression for % 2 r . Simply 
transpose the roles assigned to the rows and columns and also interchange 
the R and C designations in x\r and you have f t . 



Chapter 20 

REMARKS ON ERROR 
REDUCTION 


In this brief chapter we shall attempt a summary and integration of 
implications, scattered through this book, having to do with the reduction 
of error variance in psychological research. In a sense, this is an extension 
of an earlier discussion (pp. 84-86). Some of the additional concepts and 
techniques could not have been introduced at that time since an under- 

chapterf ^ th6m ^ d ® pendent 0n material presented in the intervening 

For our present purpose, we shall subsume errors under three headings- 
measurement or observational errors, errors in inferring population 
parameters m field or survey studies, and errors in experimental testing 
of hypotheses. About the first of these, we remark only that errors of 
measurement can be reduced by developing more reliable tests or (when 
feasible) by averaging repeated measurements. 

FIELD STUDIES (SURVEYS) 

Surveys for the purpose of gauging opinion, and studies designed to 
establish normative data, require large scale sampling. The aim is to 
secure a sample which is unbiased, that is, representative of a defined 
population, with chance sampling errors as small as possible. We shall 
limit ourselves to.three sampling methods: random, stratified, and area. 

Random sampling. The conditions of random sampling have been 
specified earlier (p. 51). By the method of random sampling it is fairly 
easy to arrive at a representative sample, provided the universe has been 
catalogued. Thus, if we wish a representative sample of school children 
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r rmt exactly satisfy the conditions of random sampling, 

population 

^ " a" 6 Tha/sheer ^ 

errors when the random method is being employed That *»>epr.sample 

s ze s not enough to reduce nonrandom errors is evidenced by he 

Zrary Digest straw polls, which rested on the assumpuon th th 

population of relephon. subscriber. •»•< 

its voting preference from the entire population of potentia vot . 

S n «; to failure 1, attribu.ed ».he alu-umen. of vomg .o 

income levels an alignment that did not exist in prior years. 

a ^ e ^.^e^^f^tratificatmn^lT'shorfUr^obviotoham'he methcul can Iw 
by the sample, approximately by 

( 20 . 1 ) 


S - 

Sp_ V N 


N 


where P equals the proportion in the total sample, AT, who f P^ sess the 
attribute, Q - 1 

fonmrl^CMlo'shows that the magnitude^f tlw enor is lew for a st^ified 

S£:»3&=k- 
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For stratified sampling, the variance of the mean may be written as 


(S 2 - SK) 


( 20 . 2 ) 


where Z = the sample mean, A 2 = the sample variance, and S 2 = the 
weighted variance of the means of the several strata about the total'sample 
mean, ff stranficauon has been accomplished by use of a variable ? Y 

wriSn in the a fo™ ated *° bdng StUdied ’ the formula ™ ^ 


Yr = -(.S 2 


s 2 i - 2 i 

^ X ’ XV) 


(20.3) 


It wiH be noticed that stratified sampling does lead to greater precision in 
the sense of smaller chance error, but only when the control or stratifying 
variable is related to the variable being studied ^ ^ 

Yht quota method involves the use of strata, but selection within the 
strata is not done on a random basis-the field worker merely fills a qul 

by securing the correct proportion per strata; selective fads leading to 
bias can easily operate. ° 

Area sampling. There is evidence that area or “pin point” sampling 
is the best method yet devised for drawing samples in survey studie S P It! 

s, 1 rr-’ PCndS °" the availabi % of extensive facilities. The 
student who is interested in this, or the stratified, method will wish to turn 
to detailed treatments of the subject.* 

SAMPLING ERRORS IN EXPERIMENTATION 

Htf/if 0 " ° f gr °" pS f ° r ex P er ™ental purposes can be accomplished 
(1) by random samphng-the random assigning of individuals to the 
groups (2) by pairing, (3) by using sibs or Utter mates, (4) by matching 

co S ndkk,ns nS Th n ,^ ^ ^ p6rSOn Under aI1 the experimental 
fatigue "" “ * f “‘ bl * » 

For methods 2 3, and 5 the statistical analysis is by way of the analysis 
of variance (mixed model) with rows standing for the matched persons or 
litters or individuals, respectively, for the three methods. The /test of the 

differences am0nS the COrrelated “ (the mis for 
conditions) involves an error term which is freed of the row variation- 

stated differently, the error term (an estimate of a two-way interaction 

orTeSf h t0 l SmaU if the COrrelations between ‘he matched persons 
n si s or between scores on the same persons are large. The 

S i mpUn f u metho f f° r cens ^ and surveys. New York: Hafner, 1949- 
emmg, W. E., Some theory of sampling. New York: Wiley, 1950. 
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foregoing argument holds, of course, for just two experimental groups (or 
an experimental and a control group) as well as for three or more groups. 

Thus compared to method 1 (random assignment), greater precision is 
attainable by using method 2 or 3 or 5. Before discussing method 4, let us 
again consider the situation where groups are needed for just two condi¬ 
tions If the groups are formed by pairing individuals, the sampling 
variance of the difference between the two means is, as we learned in 
Chapter 6, given by 

S 2 d = _ 2 r 12 S x S Xi ( 6 . 8 ) 

The gain in pairing, over random assignment, depends on the magnitude 
of r 12 . It can be shown that if the pairing is done on the basis of variable Y 
the value of r 12 will be r 2 xy , and in case two or more variables are controlled 
by pairing, r 12 will be the square of the multiple correlation between the 
dependent variable, X, and the control variables. The reason for pairing, 
it will be recalled, is to make the groups comparable on certain variables 
which might affect the outcome of the experiment. We now see explicitly 
that the advantage of pairing depends definitely on how highly the variables, 
so controlled, are correlated with the dependent variable. No correlation, 

no gain; low correlation, little gain. 

Method 4 is another way of making groups comparable on pertinent 
variables. Instead of pairing persons, distributions are matched for the Y 
variable, to be controlled, in such a manner that the two groups contain 
the same proportions of cases in the several intervals as hold for a supply 
distribution on 7. The sampling variance of the difference between the 
two X means is given by 

S* D = s\{ 1 - r* w ) + s*,,(l - r 2 „) (20.4) 

If the matching has been made on the basis of several control variables, 
the two correlations (one for each group) become the multiple rs between 
X and the control variables. 

From (20.4) we may deduce the following fact. Where two groups have 
been separately matched as to distribution on the same control variable(s), 
the standard error of the difference can be obtained without the restriction 
of the ordinary pairing procedure, which requires that there be an equal 
number of cases in the two groups. The reader will note that either term 
in formula (20.4) is, as might be expected, identical to formula (20.3) tor 
the sampling variance of a mean when the stratified method is used. The 
method of matching distributions is particularly useful when the cost per 
case is much greater in the experimental group than m the control group. 
Precision can be increased by taking a larger control group—a possibility 
also when the groups are chosen by randomization. 
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The use of paired individuals for experimental (and control) conditions 
has long been recognized as a sound procedure. We might argue, however 
that the advantages of pairing have been overstressed. The gain in error 
reduction may not be appreciable. The advocates of pairing say that they 
are not willing to risk randomization as a method for setting up groups 
but is should be noted that there are always numerous variables which 
might affect the outcome of an experiment that are never controlled except 
by randomization. Thus we can seldom, if ever, completely avoid placing 
faith m the randomization process. Random differences between groups 
never have more than a random effect on the results; the error formulas 
always include all random effects. When pairing leads to only a slight 
reduction m error, we have evidence that the pairing procedure may not 
have been worth the effort involved. 

It should be noted that an original group which is split into experimental 
groups either by the random method or by pairing must be regarded as 
representative of some defined universe, and that such conclusions as are 
drawn from the experiment cannot be generalized unless it can be shown 
that the defined universe is representative of the generality of mankind with 
respect to the variables being studied. In other words, those who use the 
college sophomore as a laboratory representative of mankind have not 
avoided, by showing that selective factors did not render their experimental 
groups noncomparable, the necessity of bridging the gap between the 
sophomore’s behavior and that of the typical human being. 

At this point, we remind the student that the covariance adjustment 
method (Chapter 18) is an entirely legitimate technique for allowing for 
uncontrolled variables and at the same time reducing error variance. 

It is appropriate to end this discussion (and the text) with an example 
of an experiment m which error reduction might have been achieved by 
judicious planning. The Lanarkshire milk experiment in England involved 
the daily feeding of three-fourths of a pint of raw milk to 5000 children and 
or an equal amount of pasteurized milk to another group of 5000 over a 
period of 4 months. These 10,000, plus a control group of 10,000, were 
measured for height and weight at the beginning and the end of the 
4-month period. Since the purpose of the experiment was to check on the 
relative merits of raw vs. pasteurized milk, the control group was non- 
essential (It is an interesting commentary on the magic of the word 
control” that very frequently a control group is used when not needed.) 
Despite large numbers, the feeder and control groups were not comparable 
as regards initial height and weight, the operating selective factor being 
the benevolent attitude of school teachers who apparently thought the 
research would not be harmed if preference was given frail, undernourished 
children m choosing individuals for the feeder groups. Either a carefully 
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supervised random, or a definite pairing, procedure would have avoided 
this selective bias, but what is more important and more relevant to our 
present topic is the claim in a paperf by “Student,” so far not refuted, that 
the use of 50 pairs of identical twins would have yielded as precise informa¬ 
tion at only 2 per cent of the cost of the original experiment, or at a saving 
of approximately 35,000 prewar dollars, 
f “Student,” The Lanarkshire milk experiment, Biometrika, 1931, 23, 398-406. 




EXERCISES AND QUESTIONS* 


CHAPTER 2 

2.1. a. Make separate frequency distributions for the marks of the two groups 
of students in Table I. Use intervals of size 5. 
b. Determine also the cumulative frequencies for each group. 


103 

98 

106 

71 

108 

120 


Table I. Final examination marks for a class in statistics 

Students with No Calculus Students with Some Calculus 

(N = 36) (TV = 22) 


150 

79 

93 

101 

113 

95 


139 

94 

106 

92 

103 

83 


79 

137 

137 

74 

108 

93 


150 

118 

91 

106 

114 

109 


134 

113 

109 

87 

105 

97 


137 

151 

131 

133 

115 

111 


139 

124 

94 

123 

90 

135 


112 

80 

96 

101 

154 


139 

153 

77 

115 

122 


2.2. a. Make separate frequency distributions for the two groups of scores in 

Table II. Use intervals of size 3. 
b. Determine also the cumulative frequencies for each group. 

2.3. a. Draw a frequency polygon for the distribution in Table III, part A. 
b. Draw an ogive for the data of Table III, part A. 

2.4. a. Draw a frequency polygon for the distribution of Table III, part B. 
b . E>raw an ogive for the data of Table III, part B. 

* These are so arranged that frequently an even-numbered exercise is of the same type 
as its immediately preceding odd-numbered exercise. “Thought” questions are intended 
as thinking, not thumbing, exercises. 
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Table II. Scores on final examination for a course on psychological tests 

Undergraduates (N = 32) Graduate Students (N = 23) 


70 

72 

76 

66 

76 

80 

84 

80 

90 

82 

84 

67 

69 

90 

50 

76 

47 

79 

62 

77 

89 

70 

51 

58 

71 

88 

65 

54 

73 

74 

87 

76 

79 

89 

64 

80 

67 

71 

90 

85 

95 

78 

69 

97 

91 

71 

63 

81 

87 

81 

78 

86 

92 




79 79 


CHAPTER 3 

3.1. For the scores of Table I, compute separately for the two groups: 

a. the medians, using the undistributed scores. 

b. the medians, using the frequency distributions. 

3.2. Repeat exercise 3.1 with the data of Table II. 

3.3. Compute the mean for each group in Table I by 

a. the definition formula for the mean. 

b. the arbitrary origin method. 

3.4. Repeat exercise 3.3 with the data of Table II. 

3.5. Combine the two distributions for the data of Table I, compute the mean 
by the arbitrary origin method, and check by using the formula for securing the 
mean for a combined group (use the means obtained by the arbitrary origin 
method for this check). 

3.6. Repeat exercise 3.5 with the data of Table II. 

3.7. Compute the median, Q x , and Q 3 for the distribution in Table III, part A. 

3.8. Compute the median, the 20th and the 80th percentile points for the 
distribution in Table III, part B. 

3.9. Using the results of exercise 3.7, locate the three points, <2 1? the median, 
and Q 3 , on the base line of your ogive curve for the distribution of Table III, 
part A. Divide the ordinate on the right-hand side (the ordinate at IQ = 170) 
into approximate fourths. Draw a line from each of the three base-line points up 
to the ogive, then horizontally to the right. Notice where these horizontal lines 
hit the ordinate on the right-hand side. 

3.10. Using the results of exercise 3.8, locate the three points, the median, P 20 , 
and P m , on the base line of your ogive curve for the distribution of Table III, 
part B. Divide the ordinate on the right-hand side (the ordinate at IQ = 180) 
into approximate fifths. Draw a line from each of the three base-line points up to 
the ogive, then horizontally to the right. Note where these horizontals hit the 
ordinate on the right-hand side. 

3.11. Compute the standard deviations for the two groups in Table I (use 
arbitrary origin method). 

3.12. Repeat exercise 3.11 with the data of Table II. 
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Table III. Distribution of IQs, form L of 1937 Stanford-Binet scale 


IQ 

A. Ages 2J-5J 

B. Ages 6-13 

/ 

cuf 

/ 

cuf 

170-179 



1 

1623 

160-169 

4 

728 

1 

1622 

150-159 

4 

724 

3 

1621 

140-149 

11 

720 

29 

1618 

130-139 

41 

709 

73 

1589 

120-129 

82 

668 

140 

1516 

110-119 

175 

586 

308 

1376 

100-109 

193 

411 

407 

1068 

90-99 

107 

218 

335 

661 

80-89 

76 

111 

215 

326 

70-79 

20 

35 

76 

111 

60-69 

7 

15 

30 

35 

50-59 

5 

8 

4 

5 

40-49 

2 

3 

1 

1 

30-39 

1 

1 




N = 728 


N = 1623 



3.13. For the distribution of IQs in Table III, part A, the mean is 106.68 and 
the standard deviation is 17.41. 

a. Determine the two points defined by M ± S. 

b. Determine the two points defined by M ± 2S. 

c. Locate these four points, also the mean, on the base line of your 
frequency polygon for the data of Table III, part A, Erect ordinates 
from each of these five base-line points to the polygon, and study the 
resulting picture. 

d. Determine approximately the percentage of cases between M ± S; 
also between M ± 2S. 

3.14. The distribution of IQs in Table III, part B, has a mean of 103.34 and an S 
of 16.88. Repeat exercise 3.13, using the values and polygon for the data of 
Table III, part B. 

3.15. Suppose the mean score on a statistics quiz is 35, the median is 36, the S is 
6, and the quartile deviation is 4. 

«. If to each person’s score we added 50 points, what values would we 
then get for the mean, the median, the S, and the quartile deviation? 

b. If we doubled each person’s score, what would be the values of the 
new mean and S ? 

3.16. Given that the distribution of scores on a quiz leads to a mean of 40, a 
median of 38, an S of 9, and a quartile deviation of 6. 
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a. If we added 10 points to the scores of each student, what would be 
the values for M, Mein, S, and Q ? 

b. If all scores were halved, what would be the values of the mean and 
the SI 

3.17. If you were told that the mean final score for the 50 students was 80 and 
the mean for the 30 men in the class was 82.3, what would you figure as the mean 
for the women ? 

3.18. Given that the mean weekly pay of the 7 working members of the Jones 
family is $55 and the median is $50 (both after deductions). 

a. What is the weekly “take home” of the family? 

b. Suppose that Daddy Jones, already the best paid, receives an increase 
which after deductions amounts to $6 a week. What is the new mean? 
What is the new median ? 

3.19. If an S is 9 when computed from a frequency distribution with intervals 
of size 6, what would you expect it to be if computed by using the definition 
formula for SI 

3.20. How large is the grouping error in an S of 13 computed from a distri¬ 
bution with intervals of size 12? 

3.21. Why would we usually expect the difference between the 10th and 20th 
percentile points to exceed the difference between the 40th and 50th percentile 
points ? 

3.22. Suppose that A knows only that the Q of a distribution is 20, whereas B 
knows that the 75th percentile is 30 units from the median and the 25th percentile 
is 10 units from the median. What can B tell about the distribution that A 
cannot ? 

CHAPTER 4 

4.1. Assume that the IQs for a large number of unselected elementary school 
children are distributed as a normal curve with a mean of 100 and an S of 17. 

a. The first quartile point will be near what value? 

b. The percentage with IQs above 130 will be? 

c. The middle 80 per cent will fall between what values ? 

d. The 99th percentile will be near what IQ value ? 

e . The percentage with IQs below 70 will be ? 

4.2. Let us presume that the Army General Classification Test yields a normal 
distribution of scores, with mean of 100 and S of 20. 

a. The value of the third quartile will be near what score? 

b. The first percentile point will be at what score? 

c. Between a score of 70 and a score of 130 will be found what percentage 
of the cases ? 

d . The middle 60 per cent of scores will fall between what score values? 

e. The value of the quartile deviation will be what? 

4.3. One way to comprehend the meaning of either sizable or small differences 
between groups is to consider the extent to which the distributions overlap. 
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Given the following data for weights of college students: 

Men: M = 142, S = 15; Women: M = 120, S' = 12 

Assuming normality for both distributions, how many men per thousand are 
lighter than the average woman ? Determine the number of women per thousand 
who are heavier than the average man. 

4.4. If the mean height for college men is 68.5 inches and the S is 2.8, and if the 
mean height for college women is 64.5 and the S is 2.5, what proportion of women 
exceed the average man in height ? What proportion of men fall below the average 
height for women ? 

4.5. Suppose that the distribution of numerical grades in a course is normal 
with a mean of 60 and an S of 10. The instructor wishes to assign letter grades as 
follows: 15 per cent As, 35 per cent Bs, 35 per cent Cs, and 15 per cent Ds. 
Determine to the nearest score the dividing line between the As and Bs, between 
the Bs and Cs, and between the Cs and Ds. 

4.6. Suppose that it has been decided to use a five-letter grading system, A, B, 
C, D, and E, and that it is required that the letters shall correspond to “equal” 
distances on the base line, the whole of which is taken to be 6 Ss. Assuming 
normality, what percentage would be assigned As; Bs; Cs? 

4.7. Determine the height of the unit normal curve at the point which is 1.2 2 
units below the median; at the third quartile point. 

4.8. What is the height of the ordinate of the unit normal curve corresponding 
to the xja value that cuts off the upper 10 per cent of the curve? The lower 25 
per cent ? 

4.9. Frequently, we must be able to translate percentile scores to standard 
scores and vice versa (assume normality). 

a. What are the standard scores (to the nearest tenth) which correspond 
to the following percentiles: 44th, 99th ? 

b. What are the percentile equivalents (to nearest value) of the following 
standard scores: —1.34, +2.06? 

4.10. Suppose a typical bell-shaped distribution. What is the approximate 
percentile value of the following points: the mean, 03 , the point which 
is one S above the mean, and the first decile point? 

4.11. What is the xja distance between the following (assume normality): 

a. the 10th and the 90th percentile points? 

b. the 25th and the 75th percentile points ? 

4.12. If a distribution of scores is normal, what is the xja distance between 
the 10th and 20th percentile points? Between the 40th and 50th percentile 
points ? 

4.13. Given that a reading test for unselected 10-year-olds yields a mean of 
50 and an S of 10, whereas an arithmetic test gives a mean of 48 and an S of 8. 
If Joe Bloke scores 52 on reading and 50 on arithmetic, is he better in reading 
than in arithmetic ? Why ? 




393 


EXERCISES AND QUESTIONS 

4.14. If a student’s reading rate score falls at the 20th percentile, and his 
standard score on reading comprehension is. -1.4, would you conclude that his 
comprehension was superior to his rate? Why? 

4 IS M ± 2 quartile deviations will give two points within which, for normal 
distributions, one would expect about what per cent of the cases ? 

4 16. For Test A the scores are transformed to Z scores with M - 50'and 5 
= 10 and for Test B the scores are transformed to T scores with M - , 

5 = 10. Why may a score of 60 on Test A be not comparable to a score of 60 

on Test B? 

4 17 Suppose we have a distribution with skewness, g x = .60. we trans 
formed the scores into standard scores; also to T scores; and also to percentiles; 
what can you say regarding the shape of the distribution o 

a. the standard scores ? 

b. the T scores ? 

c. the percentile scores ? 

4 18 Given that the distributions for two groups, with Ns equal, are normal 
in form Now consider the distribution for the two groups combined. Under 
what condition would you expect the shape of the combined distribution to be: 
Platykurtic? Leptokurtic? Normal? 

CHAPTER 5 

5.1. If you tossed 4 unbiased pennies 160 times, how often would you expect 

to have 2 heads and 2 tails ? 

5.2. Suppose you roll a pair of fair dice once. What is the probability that 
exactly 11 spots will turn up? 

5 3. Suppose that you are rolling 2 fair dice, one red and the other white 
What is the probability of obtaining a 3 spot on the red die and a 4 spot on the 

white one? . . r , 

5.4. In that back-alley game known as “crap shooting,” the obtaining of spots 
on the 2 dice totaling 7 seems to be of paramount importance at ceitain times. 
What is the probability of rolling a 7 (assume gentlemen’s dice) ? 

5.5. Suppose that we have 3 pyramidal objects (perfectly homogeneous)^which 
can be rolled like dice. The sides of each are numbered 1, 2 3, 4, and success 
is defined as the getting of 4s on the down sides. Determine the 

obtaining exactly three 4s; exactly two 4s; exactly one 4; and no 4s. What is 
the probability of securing at least two 4s ? 

5.6. If you were dealt 1 card from each of 5 well-shuffled decks, what is the 

probability of all 5 cards being spades? 

5.7. The probability of drawing a red card from an ordinary (and welbshuffled) 
deck is | and the probability of drawing a heart is J. Why isn t - 2 - plus 4 the 

probability of drawing either a heart or a red card, or is it? 

5 8 Suppose that for a class of 100 the number of As given on the first quiz 
is’ 15 andthat the number of As on the second quiz is also 15. Suppose further 
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in^ hat T 68 °1 th£ StUd£ntS 316 placed on sli P s which ar e then well mixed 

mLh A StUdent ‘f eS a four - alterna tive, 12 question multiple-choice test If he 
y g ae ^es, what is the probability that he will get all 12 questions correct? 

: , Th ®‘ yplCal ESP deck co ™«s of 25 cards, with 5 cards for each of 5 
n„mh by h r exper,menter m a room remote from the person. The “score” is the 

number of correct calls in a run through the pack 

"■ SnT e th£ nUmeriCaI V3lUeS f ° r/; ' * and " for the binomial distri- 

*• Would “scores” of 3 and 7 be equally likely on a chance basis? Why? 

' ‘ "■ turn un" thereof ^ ^ T* ‘° SS the nu mber of heads that 
Label th - « ^ ° btaimn S a fre q u ency distribution with an N of 64. 

Labe this Senes A. Toss the coins 64 more times, and label the result- 

6 u1nt S the U b 0naS f ieSA Th “ combine the ^.odistriblnl 

wherffi blnomial expansion, ascertain the expected distribution 
when 6 coins are tossed 64 times; 128 times 

c. Compute the mean and standard deviation for each of your three 
distributions; also for the expected distribution (round to idecumls) 

d. Determine the proportion of times that 3 heads, also 6 heads turned 

XthTe S T; and in the COmbined series - Compare these resuS 
with the expected proportions. r 

c. Subtract the mean of Series A from that of Series B (keep sign if 

negative). For the proportion of times 3 heads turned up. subLct 

the Series A proportion from that for Series B (keep sign). 

"? g f a , tbC ^’ Sults to class so that frequency distributions may be 
propo“’ ’ Pr ° P0rti0nS ’ and differences between Ms and between 

5.12. Do exercise 5.11, using 7 coins. 

vmf, lf , 42 , ° f u° mS ‘ Urn t0 the ri 8 ht at the 6rst choice point in a maze would 
y u conclude that rats, in general, prefer to turn to the right at this choice’point ? 

5.14. If at a particular time 50 per cent of all eligible voters favor theDemocrats 

fnd!x ofTffinl a " u me il ligenC V eSt ° f the Binet *y pe are at times assigned an 

Givenle fob • r “ " " g m ° re than the Pontage passing the item 
Given t he fl owing for an item: of 100.12-year-olds, 60 per cent passed- of 

100 13-year-olds, 80 per cent passed. When possible sampling errors are con 

sidered, would you conclude from these two difficulty indices 8 that the item is 
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really more difficult for 12-year-olds? State the significance level associated with 
your conclusion. 

5.16. If a political issue is favored by 55 per cent of a sample of 200 Republicans, 
and by 46 per cent of a sample of 250 Democrats, would you conclude that the 
populations of Republicans and Democrats differ on the issue? 

5.17. a Given the data in Table IV, do items a and b differ significantly in diffi¬ 

culty for the 4-year-olds? Ditto, the 5-year-olds? 
b, Is there a significant difference between 4- and 5-year-olds on item «? 
On item b ? 


Table IV. Data for passing (P) and failing (F) items on the Stanford-Biiiet Test 
4-year-olds 5-year-olds 


Item 

Item 

Item 

■------JL._ 

Item 

Case 

a 

b 

Case 

a 

b 

Case 

a 

b 

Case 

a 

b 

1 

F 

F 

21 

P 

F 

41 

P 

P 

61 

P 

P 

2 

P 

F 

22 

P 

F 

42 

P 

P 

62 

P 

P 

3 

P 

P 

23 

P 

F 

43 

P 

F 

63 

P 

F 

4 

F 

P 

24 

P 

P 

44 

F 

F 

64 

P 

P 

5 

P 

F 

25 

P 

F 

45 

P 

P 

65 

F 

F 

6 

F 

F 

26 

P 

P 

46 

P 

P 

66 

P 

P 

7 

F 

F 

27 

P 

F 

47 

F 

F 

67 

F 

F 

8 

P 

F 

28 

F 

F 

48 

P 

P 

68 

P 

P 

9 

P 

F 

29 

P 

P 

49 

P 

P 

69 

P 

P 

10 

F 

F 

30 

P 

F 

50 

P 

P 

70 

F 

K F 

11 

P 

P 

31 

P 

P 

51 

P 

P 

71 

P 

P 

12 

P 

P 

32 

P 

F 

52 

P 

P 

72 

F 

F 

13 

F 

F 

33 

P 

F 

1 53 

P 

P 

73 

P 

F 

14 

P 

P 

34 

F 

F 

54 

P 

P 

74 

P 

F 

15 

P 

F 

35 

P 

P 

55 

P 

F 

75 

F 

F 

16 

P 

F 

36 

P 

F 

56 

P 

F 

76 

F 

F 

17 

P 

P 

37 

P 

P 

57 

P 

P 

77 

P 

F 

18 

F 

F 

38 

F 

F 

58 

P 

P 

78 

P 

F 

19 

F 

F 

39 

F 

F 

59 

P 

F 

79 

F 

P 

20 

P 

P 

40 

P 

F 

60 

P 

F 

80 

P 

F 


5.18. a. Would you conclude from the data of Table V that items c and d differ 
significantly in difficulty for the 6-year-olds ? Ditto, the 7-year-olds ? 
b. Would you conclude from the data of Table V that, in general, 7-year- 
olds are more successful than 6-year-olds on item c? On item dl 
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Table V. Passing (P) and failing (F) information on two Binet Test items at two 

age levels 

6-year-olds 7-year-olds 


Item 


Item 


Item 


Item 


Case 

c 

d 

Case 

c 

d 

Case 

c 

d 

Case 

c 

d 

1 

F 

F 

21 

P 

P 

41 

P 

F 

61 

P 

F 

2 

P 

P 

22 

P 

F 

42 

P 

P 

62 

F 

F 

3 

F 

P 

23 

F 

F 

43 

F 

P 

63 

P 

F 

4 

F 

F 

24 

F 

F 

44 

P 

P 

64 

F 

F 

5 

F 

F 

25 

F 

F 

45 

F 

P 

65 

P 

P 

6 

F 

F 

26 

P 

P 

46 

F 

F 

66 

P 

P 

7 

P 

F 

27 

P 

F 

47 

P 

P 

67 

P 

P 

8 

F 

F 

28 

P 

F 

48 

F 

P 

68 

P 

P 

9 

P 

F 

29 

F 

F 

49 

P 

P 

69 

P 

P 

10 

F 

F 

30 

F 

F 

50 

P 

F 

70 

P 

P 

11 

P 

F 

31 

F 

F 

51 

F 

F 

71 

P 

P 

12 

P 

P 

32 

F 

F 

52 

F 

F 

72 

P 

P 

13 

F 

F 

33 

P 

F 

53 

F 

F 

73 

P 

F 

14 

P 

F 

34 

P 

F 

54 

P 

P 

74 

P 

F 

15 

F 

F 

35 

F 

F 

55 

F 

F 

75 

F 

F 

16 

P 

P 

36 

F 

F 

56 

P 

P 

76 

P 

P 

17 

F 

F 

37 

F 

P 

57 

F 

F 

77 

P 

F 

18 

F 

F 

38 

P 

F 

58 

P 

P 

78 

P 

F 

19 

F 

F 

39 

F 

F 

59 

P 

P 

79 

P 

F 

20 

F 

F 

40 

P 

P 

60 

F 

F 

80 

P 

P 


5.19. In a student presidential election, Mr. Ralph received 2389, or 60 per 
cent, of the votes cast. Suppose that you had been able to poll a sample of 100 
the day before the election. Assuming that such “last day” changes as took 
place were balanced so that neither candidate gained, how often would samples 
of 100 yield a majority? (i.e., what is the approximate probability that on the 
basis of a sample of 100 you would have predicted Mr. Ralph’s election?). 

5.20 A sample of N = 100 yields a percentage of 55 yeses to a question. Under 
what condition would you expect a large number of successive random samples 
to yield percentages of yeses that would average 55 ? 

5.21. Suppose a situation involving the difference between groups, which calls 
for a two-tailed test of significance and that we have decided on P = .01 as 
our level of significance. 
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a . What is the probability of committing a type I error if the null hypo¬ 
thesis is really true ? 

b. If the null hypothesis is not true, what is the probability of making a 
type I error ? 

c. If the true difference were 3,3, what additional information would you 
need in order to figure out the probability of making a type II error? 

5.22. Let beta stand for the probability of correctly rejecting the null hypothesis 
and let one minus beta stand for the probability of making the type II error. 
Under what condition could these two probabilities be equal ? 

5.23. For a sample of 100 it is found that 60 say yes and 40 say no when asked a 
certain question. For the difference, .60 - .40, between the two proportions 
why would it be incorrect to take the square root of p c qd 100 + p G q c ! 100 as the 
standard error of the difference? 

5.24. Consider the setup for testing the difference between two nonindependent, 
or related, proportions via the square root of (a 4- d)lN. Although we have not 
explained the concept of correlation (or association), do you see a basis for 
saying that {a + d)/N tends to be smaller the higher the correlation between 
the two sets of responses? 

5.25. Suppose percentages of 37 and 39 are found for two samples, each of size 
100, drawn from a defined population. Since a difference as large as 2 percentage 
points can easily, for the given As, arise on a chance basis, it would seem safe to 
conclude that the samples are in very close agreement. From this degree of 
similarity in results, would you conclude that the sampling method has avoided 
bias? Explain or defend your answer. 

5.26. Some textbooks have argued that whether or not a sample is representa¬ 
tive (i.e., not biased) can be judged by splitting it (the sample) into random halves, 
and then claim representativeness if the means for the two halves are not 
significantly different. Any comment? 

CHAPTER 6 

6.1. For a sample of 2970 cases, ages 2.5 to 18, the distribution of IQs on 
Form L of the 1937 Stanford-Binet yields: 

Mean = 104.00 Skewness (g x ) = .028 

S — 17.03 Kurtosis (g 2 ) = .346 

In answering the following questions, indicate the steps in your computations. 

a. Would you conclude that the mean IQ of the population for these ages 
is 100 (the value expected for a properly constructed IQ test)? 

b. Is it reasonable to believe that the IQ distribution for the population, 
at these ages, has normal skewness? 

c. Would you conclude from the sample kurtosis that the kurtosis for 
the population differs from normal kurtosis ? 
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6.2. Suppose that the mean IQ for the general population is 100 and the stand¬ 
ard deviation is 17. If a sample of 289 cases were drawn at random, what would 
be the probability of obtaining a mean as great as 101 ? As low as 98 ? 

6.3. Suppose it is known that the standard deviation of scores for a population 
is 20. How many cases would you need to draw in order that the standard 
error of 

a. a sample mean be 2 score points ? 

b. a sample S be 3 points ? 

6.4. Suppose that you are polling on an issue for which opinion seems about 
equally divided. How many cases (how large an N ) would you need to be sure 
(at the .01 level of significance) that a sample deviation of 3 per cent from 50 per 
cent is nonchance ? 

6.5. One of the requirements of a good IQ test is that the mean IQ for un¬ 
selected cases of any school age group shall be 100, and that the distributions for 
the several age groups shall have the same standard deviations. Given the 
following for the 1937 Stanford-Binet Test: 


Age 6 12 


N 203 202 

M 101.0 103.6 

S 12.5 20.0 


a. Is it reasonable to believe that the test is yielding the desired mean when 
used with 12-year-olds? 

b. Would you judge from the results for these two age groups that the 
requirement of equal variability has been met ? 

6.6. The means and standard deviations for two groups of twins on spool 
packing are as follows: 



Fraternals 

Identicals 

N 

92 

94 

M 

761 

741 

S 

79 

66 


Do these groups differ significantly in mean performance ? In variability ? 

6.7. Two forms of a test, to be comparable, should yield similar means and 
similar standard deviations when given to a group. For 202 cases of age 7, we 
have the following data for the 1937 Stanford-Binet: 

Form L Form M 


M 

S 


101.8 

16.2 


103.5 

15.6 
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In order to balance practice effect, one-half the group was tested on Form L, 
then on Form M, whereas the reverse order was used for the other hah. I he 
correlation between the two sets of IQs wqs .93. Is the obtained difference 
between means larger than one would expect on the basis of chance sampling. 

Ditto, the difference between the Ss? 

6 . 8 . Measurements on 1000 of each sex at birth have been reported in the 
literature. The mean length of boys (in centimeters) was 50.51 and the 5 was 
2.99, and the values for the girls were 49.90 and 3.00. Is there evidence here for 
sex difference in length at birth? 

6.9. Given a two-tailed test of the hypothesis that no change has taken place 
and that the standard error of the mean change is 3 points and that we use the 

.05 level for judging significance. _ x . • +1 

a . If the true change is a loss of 6 points, the probability (beta) of correctly 

rejecting the null hypothesis is approximately -- whereas if the 

true change is a gain of 12 points, the value of beta is approximately 


b. If the true change is zero, the value of beta is — . 

c. If, instead of a two-tailed test, a one-tailed test were used, the prob¬ 
ability of making the type I error would be-. 

6.10. Given that a two-tailed test is appropriate, that we have chosen the .05 
level of significance, that the obtained difference between two means is 3, and 
that the standard error of the difference is 2. 

a. Would we reject the null hypothesis? 

b. What is the probability that we will make a type I error? 

c if the true or population difference were 6, what is the (approximate) 

probability that we would correctly reject the null hypothesis? 

d. If the true difference were zero, what can you say about the likelihood 
of making a type II error ? 

6 II Given that a sample yields 98 per cent of yeses to a question and that the 
standard error of the percentage is 2. If we set the .99 confidence limits as 98 
±2.58(2) we arrive at the absurdity of an upper limit in excess of 100 per cent. 

Why? 

6.12. Suppose you draw a sample of size 3 (yielding scores of 90, 99, and 102; 
mean = 97) from a population which you know to have a mean of 100. What 
would be your best single estimate of the population variance? 

6.13. Consider the following two statements: (a) the probability is .95 that 
sample means will not deviate more than 1,9 6S M from the population mean, and 
(b) the probability is .99 that the population mean will not deviate more than 
2. 5SS m from a sample mean. Which statement is false? Why? 

6 14 “The true mean has a 95 per cent chance of falling in the 95 per cent 
confidence interval for the true mean.” ( This statement is incorrect. Why? 
Restate it in correct fashion. 

6.15. If the standard deviation for a very large (infinite) population equals that 
for a small (finite) population, samples of the same size from the two populations 
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will lead to confidence intervals for the population means that are the same or 
different in width ? Why ? 

6.16. For normal distributions the sample mean and sample median are 
estimators of the same central value (location parameter). For fixed N, how will 
e .95 confidence limits differ when we use interval estimation based on the 
mean and also on the median? Why? 

t 11 ' We J didnot discuss the sampling instability of percentiles. Do you think 
that for a distribution of 200 scores the 55th percentile point will be more or less 
stable m the sampling sense than the 95th percentile point? Why? 

6.18. The text claims that, for samples drawn from a normally distributed 
population, the median is less stable in the sampling sense than the mean Can 
you specify a type of distribution (as to shape) for which the median might be 
more stable than the mean ? Explain. 5 


CHAPTER 7 

7J : Ex P erimenter A randomly assigns 12 persons to an experimental group 
and 12 other persons to a control group, thus assuring independence for the two 
groups. Experimenter B assigns 12 persons at random to an experimental 
group but for the control group he selects 12 persons by matching (by pairs) 
em with the 12 individuals in the experimental group. Both experimenters 
evaluate the difference between the group means (their own groups) via the t test. 

a. How many degrees of freedom for A’s t test ? 

b. How many degrees of freedom for B’s t test'? 

'' r^uhs? e Why? enter ^ y ° U think W ° Uld haV£ great£r Predsi0n in 

7.2. When comparing means for two sets of scores belonging to two indepen- 
en groups each of size N = 8, we find the number of degrees of freedom is 

, and when we are comparing means of two sets of scores belonging to 
two matched groups of 10 each the number of degrees of freedom is 

7.3. In connection with a topic not yet studied, the sums of squares of devia¬ 
tions about means may be combined for three groups in order to obtain a 
— “• If the Ns are 14 > 8 > an d 10. the number of degrees of freedom 

7A : that the “"biased estimate of the standard error of a mean difference 

is 4, that the chosen level for judging significance is .01, that a one-tailed test is 

r a e P a P c r h 0 2Tn f ^ ^ ^ df = A — 1 = 24 - 1 = 23 the value of , must 

reach 2.50 for claiming significance at the adopted .01 level. 

«. Under these circumstances the probability of committing the type I 
error is what? b 

b. How large would the mean difference for the population need to be in 
order to have the probability of making the type II error be exactly .50? 

J’ 5 ’fiH G ' Ven 3 me ? n °. f 50 baS6d ° n 21 cases ’ with ^ = 2. Ascertain the .95 
confidence interval; also the .99 confidence limits. 
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7 6 When setting the .95 confidence interval for a population mean the 
procedure for small samples differs from that for large samples in what two 

respects ? - 

7 7 An experimenter knows that the for population A is 4 and the u for 
population B is 3. He draws a sample of size 10 from population A and a 
sample of size 10 from population B. In testing the significance of the difference 
between the means of the two samples he uses large sample techniques. Is he 

justified in doing this? Why? 

7 8. An experimenter uses a sample of size,15. He wishes to test the hypothesis 
that the population mean is 100. However, he knows nothing of small sample 
techniques. If he uses large sample techniques for his test, will he increase or 
decrease his probability of making a type II error over what it would be 
used small sample techniques? Why? 

CHAPTERS 8 AND 9 

8-9 1 a Using the data of Table VI, make a scatter diagram with “Ex” 
on the y axis, intervals of size 5; and with “TMT on the * axis, 
with i = 3 and the first interval taken as 105-107 (interval sizes are 
suggested in order to facilitate an exact check of the tallying and 
subsequent computations). 

b . From the scatter diagram, compute the correlation between Ex 
and “TMT”; also compute the two means and the two standar 

deviations. , ,, c ‘tTiv/rT” 

c. Write the regression equation for predicting Ex from 1M . 

Draw the regression line on your scatter diagram. 

d. Determine the error of estimate for predicting “Ex” from a knowledge 

e. What percentage of the variance in “Ex” is due to or associated with 
’ variation in “TMT”? 

8-9.2. Do exercises 8-9.1 with “CM” substituted for “TMT” (an appropriate 


interval size for “CM” is rather obvious). 

9.3. The standard deviation of difference scores (D) based on « 


■* » nd z v 


What is the correlation between X and E? Hqw did you 


(D = 2 * - z y ) is .40. 
get your answer ? 

9 4. Consider the general formula for the variance of a difference SV» = f* 
+ s\ - 2r„S x S„. Can you suggest a method for determining the r between two 

sets of correlated scores? 

9.5. Consider Y as weight and V as height and that r has been computed and 

that B and A have been calculated for the regression equation - ' 

Now as regards the metric (or measurement units), Y is in pounds an 

is in inches. What can you say about the units (or metric) for A. For . 
For rl 

9.6. Suppose we consider the regression lines, that for Y on X and that for V 
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Table VI. Data for 38 students in a course on mental tests (“Ex” stands for final 

^TM^ta ?r S f ° r IQS ^ °" Tenian-k"^ 

tal Abdity, CM stands for scores on the Terman Concept Mastery Test) 


Ex TMT CM 

62 123 47 

107 129 59 

87 131 78 

95 129 74 

100 122 52 

87 136 127 

87 125 74 

64 121 46 

89 131 97 

58 128 71 

84 123 28 

80 127 53 

82 120 53 


Ex TMT CM 


Ex TMT CM 


54 128 69 

86 132 82 

92 114 41 

67 113 44 

102 141 112 


92 114 

67 113 


79 132 72 

82 126 54 

96 131 111 

77 131 65 

75 109 22 

93 131 108 

67 106 25 


wOHh/t thC aSSUm P tio , n of linearit y holds for both lines, under what condition 
will the two regression lines 

a. coincide? 

b • be at right angles to each other? 
c. both have negative slopes ? 

^ °“ aS r lly W£ encounter a statement which goes something like this- 
What doeTtSimS^ 0 ^ ^ ^ '° W “ ^ ^ high SCOrers ' 

9.8. Although it was argued that S„. x is preferable to the standard deviation 

Y rmedTctedl^m a™* ^ ° Wn , meanS aS a wa ? of specifying errors when 

different f V h u y ° U lndlcate two P ossibie situations, distinctly 

dtflFerent, for which the array sigmas would be much better than S v . x 7 7 

9.9. Under what condition do three variances add to a total variance ? 

9.10. Given: M v = 40 M =50 v — s c j 

the mean and the .t a a 1 ■ ’ ^ ~ 8 ’ S y = 6 and r„ = .00. What will 

the mean and the standard deviation be for the sum score, X + Y1 

betteenVand 7 ? 0311 * * mani P uIated without hanging the correlation 

9.12. Suppose that the score on a first quiz is to be combined with the score 
s^mn1e^dd v qU1Z r S t aS ‘° g ‘ Ve ^ wei g ht to the two quizzes. Why might a 

eq"al welghtkg? ^ ***“ ** “ individUal faiI l ° aCCom P Iish the ^ired 
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9.13. We saw that r 2 was equal to the ratio of two variances, hence providing 
a percentage interpretation of correlation. Algebraically, by taking square roots 
we have r as the ratio of two 5s. Why cannot the latter ratio be safely interpreted 
in percentage terms? 

9.14. Numerically an r of .60 is twice an r of .30; under what circumstances, as 
regards interpretation, can we regard 

a . .60 as exactly four times .30? 

b. .60 as nearly four times .30? 

c. .60 as being twice .30? 

9.15. Test 7 has an 5 = 10. The 5 of the predicted Ts from X (5 y 0 = 6. What 
is the standard error of estimate (S y . x )'l What is the correlation between X and 
7? 

9.16. A critic of the text has said that 20/ — y'flN does not qualify as a 
variance unless it is first demonstrated that 20/ — y ) == 0. Can you supply 
a very, very simple algebraic proof that 20/ — y') does equal zero ? 

9.17. Given that the correlation between X and Y for Group A is .60 and for 
Group B also .60. Under what specific conditions would the correlation between 
Xand Y for the two groups combined be much higher, say, .80? Ditto, much 
lower ? 

9.18. Although it is indubitably true that an r must reach .866 to reduce the 
error of estimate by 50 per cent of what it would be for an r of zero, do you 
see another way of looking at the situation which does not make the picture so 
black”? Hint: Imagine the two scatter diagrams, one for an r of .866 between, 
say, W and 7, the other for an r of zero between W and X, with W being a 
variable to be predicted and Y and X being possible predictors. Now suppose 
a person 2 standard score units above average on both Y and X. 

CHAPTER 10 

10.1. An N of 101 will yield .10 as the standard error for near zero rs. If 
we have adopted the .05 level of significance and are using a two-tailed test: 

a . What is the probability of making a type I error if the population r is 
zero ? 

b. What is the probability of making a type I error if the population r is 
+ .06? 

c. What is the probability of making a type II error if the population r is 
-. 20 ? 

d. How often would we correctly reject the null hypothesis if the popula¬ 
tion r were +.20? 

10.2. For large samples the sampling distrib ution of low rs may be regarded 
as normal, with standard error of 1/VjV - 1. Suppose in what follows that N 
is 101, thus giving a a r of .10. 

a. If the null hypothesis of no correlation for the population being 
sampled is true, the probability of a sample r exceeding +.20 is 
approximately what? 
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b. When the r in the population is .10 and we are using a two-tailed 
test and the .05 level of significance, the probability of committing 
the type II error is approximately what ? 

c. If a population r is .26 and we are using a one-tailed test and the .01 
level, the probability of correctly rejecting the null hypothesis is 
approximately what? 

d. If a sample r is .15, the .95 confidence limits for the population r are 
approximately what? 

e - For part d, the probability that the limits so set will not include the 
population value is what ? 

10.3. The classical formula, (1 -r 2 )/V]v, f or the standard error of an r 
implies that the degree of sampling stability for an r = .90 is greater than for 
an r = .30, yet by the z transformation the standard errors for the two corre¬ 
sponding zs are the same. What can you say about the relative sampling stability 
of an r of .30 and one of .90, each based on 103 cases? 

10.4. Do you think that a sample r of .80 could as readily be a chance deviation 
downward from a population value of .90 as it could be a chance deviation up¬ 
ward from a population value of .70? Explain. 

10.5. It was argued that the degrees of freedom, N - 2, for the / test of r was 
logical because the two constants, B and A, in the regression are calculated from 
the data hence two restrictions and hence 2 subtracted from N. What happens to 
the number of degrees of freedom when the regression equation is written in 
standard score form, or z' y — rz x l 

10.6. Given that the population value for the correlation coefficient between 
two variables is 4-1.00 (possible only when errors of measurement are zero): 

a. What would you expect the correlation to be for a sample of 100 cases ? 

b. What would be the value of the standard (sampling) error for the 
correlation based on 100 cases? 

c. If one variable were curtailed so as to lead to a 50 per cent reduction 

in its standard deviation, what would you expect as a sample value 
forr? r 

10.7. As regards the shape of their random sampling distributions, proportions 
and correlation coefficients have what in common under conditions A (specify) 
and what in common under conditions B (specify) ? 

10.8. For Group 1 the r between two variables is found to be .25, which is 
significant at the .01 level, while for Group 2 the same two variables correlate 
.10, which is significant at the .16 level. Since the investigator has adopted the 
.05 level for judging significance, he concludes that (because one r is and the 
other is not significant) the degree of correlation in the two populations being 
sampled is different. Any comment ? 

10.9. Suppose an r of .90 based on 16 cases, and suppose that we establish the 
P = .95 confidence limits first by use of the classical standard error of r and then 
by way of the z transformation. Which method will yield the higher upper 
bound? Why? (No arithmetic called for.) 
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r 2 ) and tS 2 ®(l — D* 


What subscripts are 


10.10. Distinguish between S^Cl 
lit ^Suppose you are to choose 

of 3. Which would you choose and why? Test b ^ ^ s of 

m i? Test A yields an S ol 1^ dna dn ’ u ^ 

For example, ^ between 70 and 74 today 

»ss^mkssss3BOT 

very unreliable tests. strictly parallel forms. This means 

10M lr SUP L 0S 5 F - m s VoA - and are similar in content; hence r M , the 
that M a = M„, S a - *i, fbotn x >, that a true score, given an 

reliability the equation *' t = with an 

obtained score, is best _ /v7~ it also follows from our 

error of estimate given by S t - ■\^ ' estimate of a scor e on 

discussion of r and regression equations that the standard 

Form B given an obtained is by * „ = r«A - '» x « ™ 

F ’ 8 ■ , -S VUUy;. Obviously, the regression estimates of 

error of estimate is S b . a - 1 “ 6 f timate for the latter is always 

„5 two „„o, 

larger than that for the forme , apparent paradox that two estimates 

formulas. How would you account for of precision? 

which lead to precisely the same value have differing deg ^ ^ 

Fo ,X-X, + E « 1»™ f „„„« theorem »e 

.. .-. *txsxxsssi 

reliability (r„„) is .91 and S a = S h - 20, mus 6 

measurement of 6. . „, mn iing the two form scores we will 

- IXSttttfSX*' How did you arrive at 

6 Would the reliability coefficient be different if y~ a ? ed the tW ° 
form scores instead of Y h “larger m smaller'standard error 

“C , otl „, 

,. •, thlt intelligence can be held constant by choosing 

10.17. It is sometimes said that into g same score, on an 

individuals (in forming a group) with the same IQ, or 
intelligence test. Any comment? 






^ PSYCHOLOGICAL STATISTICS 

10.18. W ha u 0 gi 1 connectio d0 you see be(ween (h£ 

' x, - x ( ) and r if corrected for attenuation ? S 

■86 with^the ^totaTscor^fincludin^ 6 ^ 1 ^ 6 ^^ ^ ^ vocabular y test correlates 
would you) to correct this r for atfenuaton ^ W ° U ' d y ° U hesitate (or 

SSLw'fSS ao C 18 V ) r :h d e th 1 Wh r en V - 50 ’ 50 = 4 ’ and - = 2 are 
make sense. ^ - I0 °’ Which 

wilt 1Xl + **> 

condttion would you expect r a to exceed .707 when rjs zeroT' Un<ier 
for 100 cases” ^ f ° n ° WinS f ° r thKe VariabIes and the 1 score on the three 


correlate^ W ° Uld > OU ex P ect ^ to 
^ C ° rre,ati0nS bet — the four variables, V, V, x, and Y as 


- .40, r w «.50, r xy — —.30 

Specify the numerical value of the following correlations- 

a. between Fand Kwith U constant. 

b. between X and V with V constant. 

d. between X and rXu ** “ ****** ^ ° f K '° nl y- 

e - for V and V with U constant. 

10.24. Rorschach testers, facing the fact th-it r 

category (say X) is influenced by the total numberT^ ^ SC ° re) f ° r a 
attempted to control R bv using ? mber of responses {R) have 

you suggest another (and 100 X/A Can 

individual differences in R in'the F r • scheme which would take care of 
dent of R1 " * m the S6nSe of ycldmg a score which is truly indcpen- 

70 per cent^ff the variance hi X^s'assoc^/c/ 5 aS h S ° dated with varia,ion in Xand 
about the correlation betweef ^ < * n * ““ 

It™ ofth^xtli 0 po h siLT ht ^ £XP,ain ~ “ 

Ihfi's thVcl7e,ftion 13 bcn W ee° i ; valableT 1 Tnd's ^ ° f ^ b£C ° meS J5; 

vctridDies j and 2 goes up when variable 3 is 
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“partialed out” even though 3 is uncorrelated with 2. How would you explain 
this ? 

10 28 We noted that the correlation coefficient is affected by heterogeneity 
with respect to one or both of the variables being correlated and with respect to a 
third variable, and we developed formulas which would correct for heterogeneity. 
Now suppose we have an r xv based on 100 boys and 100 girls (N ~ 20 ). 

a. Then by making separate sex distributions for X we find a marked 
sex difference; how might our r xy be influenced by this sex difference . 

Why? 

b. Next it is discovered that there is also a sizable sex difference on Y. 
Considering now that both variables show sex differences, what can 
you say about the effect of such differences on r xy 1 Again, why ? 

c. Can you propose (in rough outline) a scheme for getting rid of the 
sex effect on ?xv ? 

CHAPTER 11 

111. How would you show that the choosing of multiple regression coefficients 
so as to minimize the error of prediction tends to maximize multiple r? 

11.2. If a criterion measure has a reliability of only .60, what is the limiting 
percentage of the criterion’s variance that is predictable by any single predictor or 
any combination of predictors ? 

11.3. Suppose in determining the beta coefficients for a 3-variable multiple 

correlation problem your calculations led to = - 70 an ^ ^ = - 80 - 

a. Why might we suspect error in your computations? 

b. But under what conditions might your values be correct? 

11.4. Consider the hypothetical multiple correlation situation involving a 
dependent variable, X l9 and two independent variables, X 2 and X 3 , each of which 
correlates .707 with the dependent variable. If r 23 = 0, and then a fourth 
variable is found which also correlates .707 with the dependent variable, what 
can you say regarding r 24 and r 34 ? Will both be zero or not? Why? (Negative 
hint: do not waste time substituting in formulas.) 

11.5. Suppose a 26-variable multiple regression problem with each of the 25 
possible predictor variables correlating .20 with the criterion variable an 
intercorrelating zero among themselves. Can you specify the value of the 
multiple r? How did you get your answer? (Obviously, you are not expected 
to answer this by using the Doolittle solution, so look for a “trick” solution.) 

11.6. If each of m independent and uncorrelated variables yields a correlation 
of .30 with a dependent or criterion variable, how many of them would you need 
in order to build up a multiple r of .45? Of .90? 

11.7. Given the following regression equation in raw score form: 

X\ = 13.2X 2 + 39.1X 3 + 18.5 

What are the possible factors that might be responsible for 39.1 being treble 13.2 ? 
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11.8. We learned that the (3 weight for a suppressant variable tends to be 
negative. Suppose a 3-variable (one dependent and two independent) multiple 
regression equation in which /? 3 is negative (say, -.40). Does it follow that 
variable 3 is a suppressant? Why? 

11.9. When we come to the analysis of variance test of the significance of 
multiple r we will learn that a multiple r based on N cases and m independent 
(or predictor) variables will have associated with it a specifiable number of 
degrees of freedom. Can you anticipate what the df will be? What is the basis 
for your answer ? 

11.10. Frequently the clinician will utilize a difference score X — Y, as a 
basis for predicting. Examples: in Rorschach this might be M - C; in the 
Babcock scheme for measuring mental deterioration the average standing on 
certain tests is subtracted from the score on a vocabulary test; on the Wechsler- 
Bellevue the score on the block design test may be subtracted from the informa¬ 
tion test score. Presuming that our clinical brethren use such difference scores 
for predicting a quantitative something of some kind, what pertinent comments 
can you make? (Note: there are at least two quite different considerations 
involved here.) 


CHAPTER 12 

12.1. Consider the fourfold table with entries A, B, C, and D , with the corre¬ 
sponding proportions a , b , c, and d. How might we go about setting up a 
measure of the success with which one variable can be predicted from another ? 
(Note. to answer by saying compute an r of some kind is not enough.) 

12.2. When two judges (observers) classify N individuals into a dichotomy, the 
amount of agreement between the two is frequently determined simply as the 
percentage agreement. In terms of the usual fourfold table (judge I vs. judge II) 
with frequencies A, B, C, and Z), this procedure is equivalent to dividing B + C 
by N. Do you see any objection to such a percentage measure of agreement? 

12.3. Sometimes in determining the interrelationship of items (scored as pass or 
fail) we encounter a fourfold table with zero frequency in either the upper-left 
or lower-right hand cell. Under what circumstances would you expect this to 
happen? (Your answer should be in terms of observables.) 

12.4. What measure of correlation would you use to describe the relationship 
between sex (as near a point variable as one encounters in psychology) and 
passing or failing a test item? Why? 

12.5. We discussed eta, the correlation ratio. What do you suppose “biserial 
eta” stands for? Write out a reasonable guess as to the formula for such a 
measure. 

12.6. Occasionally we find instances in which the correlation between the 
percentile scores on two tests has been computed. Such a correlation coefficient 
resembles one type of correlation discussed somewhere in the text. Which? 
In what way ? 
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• ui VcmH Y Does saving that the two are uncorre- 
S -n thlt y and X a, independent, 

12.8. Sometimes a skewed distribution of scorn;^ ^qTamTo^of Now 

suppose r„ = .60 and that y B g the corre i a tion of these new 

Hszyr'X* r: 

zs^^»r?rty^ s ^zs> 

relationship, say theta, as __ 

theta = 1 ^2 

Wealsoeomputerandeta. Pr.biem, mr.ng, Ih.ia, .«. and r » » ma E nit«de. 

S: 

fashion, and both yielding . t (th last tw0 by dichotomizing near 

£ E2SSSS.“ “ — te “ p '“ d 

Sl. On, form of tbe fotm.i. ^— 

contains M 2 - My Now suppose e hence the computed correlation 

r r ksj* i, 

delinquents from nondelinquents and s pp ^ 9Q are nondelinquents 

«y about the e value of scores 

between 160 and 170? Any cautions? 

CHAPTER 13 

13.1. Under what circumstance is the chief assumption underlying the chi 

ST ^^mT^chol. Bull article on Jhe usti and misuse of^ch, 

*—• 1 ~ 

occurrence.” Can you illustrate what is meant by th . 
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13.3. Specify briefly the types of situations for which it is easy to substitute the 
simple binomial for chi square as a test of significance. 

therff/oV^f ° f CW SqUare ’ Pears0n and his Flowers claimed that 

ensuing argumentby po'infinglt'that tt 

- *• - 

13.5. In what sense is chi square a “test of independence” ? 

squtrentemsT.Tbe 10 " ‘ hat observations be independent (a requirement for chi 
square) seems to be violated m a true fourfold contingency table How would 

this assumption^ 6 begmn£r ^ ^ ^ f ° r the f ° Urf ° ,d ,able does violate 

Sh^,id < ?n CaS , i ° n . ally ‘ n 3 contin 8 enc y table a cell may have zero frequency 
th H e h W itf lead toan y changes in using the chi square technique foresting 
the hypothesis of independence? If so, what? If not so, why m* ? § 

Table VII 

Item 2 
F P 

p 0 10 10 


13.8. Given pass (P) and fail (F) information for two test items in Table VII 
Consider esting the two hypotheses: H t for independence 

^ wo “ ,d “ ■* ~»“*« 

1 - ; 

but there are 2k observed frequencies. 1 ~ ] 

of7LndThatT2\?2 y co 3 n C t 0ntin8enCy K, table ’ Wi ‘ h " = 10 °’ y idds 3 Chi S q Uare 

Of 7 Q TU i 2 b 7 2 contln g enc y tab,e > with N = 100, also yields a chi square 
rn u ,. 1US tbe va ue of tbe contingency coefficient for each table becomes 27 
(by substituting in the formula for C). Defend one or the other of th! f°n 
statements regarding the statistical significance of the two Cs. ° WmS 

a. Both are of equal significance. 

b. They differ in significance. 

hte^the^SS 6 l ° I""" Wh f her the i,ems on the 1937 Stanford-Binet 
1931 33 a .. y values as during the standardization testing period 

the chfficnlt ° n WhK ? 5 “’ 6 -’ 7 -’ and 8 -year-olds are tested might yield 
fficulties (percentages for passing) given in Table VIII. § ^ 
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Table VIII 


Age 

5 

6 

7 

8 

N 

101 

203 

202 

203 

1931-33 

% 

26 

41 

58 

71 

N 

90 

120 

130 

115 

1955 

% 

20 

37 

49 

68 


(Group A) has been on a standard d e , ^ ^ 20. In order to see 

which is deficient m Vitamin . a food w ; t h high natural vitamin 

whether the vitamin deficien group y o f] 0W content, the members of each 

content (say food X) in preference to food J,° days . m Group A 

group are given a chan “ ^ ® °\ • Group B there are 70 preferences for 

there are 90 preferences for food X and i P He reasons 

food X. Now our “gator has tord of^ he chi sq ^ ^ ^ ^ x 

that by chance » be for X and in Group B 50 of the 

i ““*' ro,to " i " s “ b,e: 


Chance for X 
Observed for X 


75 

90 


50 

70 


fan _ 7S^ 2 /75 4- (70 — 50) 2 /50 = 11. He 
from which he calculates chi square as ( J and seeing no restrictions 

items: 

Table IX 

Item 2 

F P 

P 20 40 P 

Item 3 

Item 1 1 . F 

F 30 10 


Item 4 

F P 
0 10 

10 7 



psychological statistics 

a. Specify two meaningful null hypotheses that are testable vln 

ar ffr *» »• ■-"*«? »«• cssssg 

c 

In "“ i *— l ®* 

whtch ° ur groups have been drawn are alike in tLr response^ 

and the ,echniques ^ ou “ « in 

independent. Tabular setup ? Technique ? ™o items are 

a. To what does the n refer? 

b. Why the concern with plus values only? 

tClT nC r e US£ ° f lM inS,ead of 196 when you 
constder that 1.96 is for a two-tailed test, whereas chi square under 

the appropriate condition, must equal 3 84 for 1 96 sonaroHt f 
two-tailed significance level of 05? SqUared) f ° r 3 

B £ xs^jssrsasas 

13.17. In general, would you expect chi square for a ? v ? t Q ui + u i 
smaller than for a 3 x 3 table? wiw? it a f 2 2 table t0 be la rger or 

reverse be true ? ■ y • n er what circumstance could the 

three chi squares Precalculated each wfthTy 0 Th 0l) t b Wer u made ’ fr ° m Which 
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CHAPTER 14 


14.1. Suppose that in reading an older (circa 1920) study you find an M D and 
Sjyj) (S of distribution of difference scores) based on A = 10. You wish to 
re-evaluate Mp via the t technique. If no actual scores are reported, how would 
you proceed to get the needed unbiased estimate of the variance of the difference 
scores ? 


14.2. In what way is the concept of “degrees of freedom” similar for chi square 
applied to frequencies and for the variance estimation situation ? 

14.3. Typically, one tail (the right-hand side) of the chi square distribution is 
involved in tests of significance. 

a . Under what specific condition does this one tail provide a two-sided, 
or two-tailed, test ? 

b. State two entirely different types of situations one of which requires 
using both tails of the chi square distribution and the other of which 
requires one tail and an alertness for the other. 

14.4. How could you use F to determine whether a variance of 25, based on 
a sample of A cases, deviates significantly from a hypothetical value of 16? 

14.5. In a certain textbook on statistical method you find the following data 

for A = 30 cases for two forms of a test having .93 as its form versus form 

reliability: ., 9 

J Mean s* 


Form A 44.4 193.2 
Form B 42.8 146.3 


To judge whether the two forms differ in variability, the author takes 193.2/146.3 
as F. Any comment ? 

14.6. The distributions of scores on successive learning trials on a pursuit rotor 
typically increase in variance from trial to trial. Why is Bartlett’s test not applic¬ 
able for testing the differences among such variances ? 

14.7. Given for the 1937 Stanford-Binet the following standard deviations (all 
testing done during March 1956, and no siblings): 


Age 3 Age 4 



Boys 

Girls 

Boys 

Girls 

As 

50 

48 

49 

51 

Form L 

17.5 

18.0 

16.8 

16.5 

Form M 

16.2 

17.2 

16.3 

17.6 


a. For the best test of the differences between Ss, the difference 

17.5 — 16.2 would be tested by which formula? 

18.0 — 16.5 would be tested by which formula? 

17.6 — 16.3 would be tested by which formula? 


PUT LIBRARY 

CARNKfE-MELLON UNIVERSITY 
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b. Specify (write down numerical values) a set of four Ss that could be 
properly compared by Bartlett’s test for homogeneity. 

14.8. Suppose for a group of 60 college students it is found that their grades 
correlate .40 with a reading comprehension test and .50 with an intelligence test. 
If the grades (grade point ratios) are expressed in standard-score form with 
variance of unity, it is immediately seen that the two errors of estimate (or 
residual) variances are .84 and .75. The r between the two tests is .70. 

a. Since .84 and .75 are both S 2 values, how would you proceed to convert 
them into s 2 values ? 

b. Presuming that you have the appropriate s 2 values, do you have any 
suggestion as to how you might proceed to test the significance of the 
difference between the two estimated residual variances ? 

CHAPTER 15 

15.1. The tabled values of F involve the ratio of two unbiased estimates of the 
same population variance. Why the requirement of “same” population vari¬ 
ance ? Is, or is not, this sameness concerned with the assumption of homogeneity 
of variances ? Why ? 

15.2. When we use F in the simple (one-way) analysis of variance to test the 
difference between, say, the means for four experimental conditions, the question 
(or requirement) of independence arises in at least three different places. Can 
you specify ? 

15.3. Cite two bits of evidence that the F test as used in the analysis of variance 
is a two-tailed test despite the fact that the tabled value for the .01 level is actually 
the .02 level when a two-tailed test is appropriate for testing hypotheses regarding 
difference in variability of two groups. 

15.4. Another textbook author says that the variance estimate based on the 
within groups sum of squares (one-way classification with G groups) “corre- 
ponds to a standard error of a difference as used in the t test.” In what way might 
this statement be regarded as partially true, and how would it need to be modified 
to make it exactly true ? 

15.5. With reference to the variance estimates indicated in Table 15.4, why 
is it not permissible to take F = s 2 r js 2 w as a means of testing the hypothesis that 
the residual variance is greater than the within arrays variance ? 

15.6. What application of chi square might be used to test the tenability of 
which assumption in simple analysis of variance ? Any cautions in arranging the 
data for such a test ? 

15.7. We had one F test in which the numerator involved eta squared minus r 2 , 
and another F test involving the difference between two multiple rs squared. In 
what way are the two Fs similar (or analogous) ? 

15.8. We learned of an F test for r x _ 23 and for r. Do you see an easy way to test 
the significance of the point biserial r by an F test ? How ? 

15.9. We have- seen how the goodness of fit of a normal curve to frequencies 
can be checked (or tested) via the use of the chi square technique, and we have 
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seen how the F test can be used to test the goodness of fit of linear regression. 
Suppose we have height measurements on 100 children at each age, 1 to 18 
inclusive (cross-sectional, i.e., we are not following the same children from 1 to 
18). All measurements are within a week of a birthday. It is sometimes argued 
that growth follows the Gompertz curve, the equation of which is a double 
exponential function: 

Y = vg h * 

in which v, g, and h are constants determinable from the data, X is age, and Y is 
height (of the children). 

a. Set up a variance table with appropriate symbols to indicate the 
breakdown of the sum of squares (for 7), the degrees of freedom 
(actual numerical values), and the variance estimates (in symbols) 
which you would use to test the goodness of fit of the data to the 
Gompertz curve. Specify symbolically the F ratio you would use. 

b. In fitting a normal curve to a frequency distribution, we set up ex¬ 
pected frequencies. What in the fitting of a Gompertz curve would you 
consider as analogous to “expected” frequency? 

c. Would your proposed scheme for testing the fit of the Gompertz curve 
be valid in case of a longitudinal study (i.e., we follow same children 
from ages 1 to 18)? Defend your answer briefly. 

CHAPTER 16 

16.1. If you mistakenly used simple analysis of variance for testing the difference 
between G means based on N sets of matched individuals, would you expect a 
smaller or larger P than would have resulted had you used the more appropriate 
two-way analysis of variance? Why? 

16.2. Consider the ordinary 2 test for the difference, via Mp, between correlated 
means. What in this z test corresponds most closely to “interaction” ? Be more 
specific than a mere statement that “it” is found in the error term or in the 
numerator or in the denominator or in the correlation between scores. 

16.3. Given information regarding sex difference at birth for large samples of 
American and of British babies. The data can obviously be placed in a two-way 
analysis of variance setup, which would permit testing for a nationality difference 
as well as for a sex difference. The sex by nationality interaction could also be 
tested by the analysis of variance method. Could this interaction have been 
tested prior to the invention of the F technique ? How ? 

16.4. During our discussion of chi square we did not mention “interaction.” 
Suppose the following data for number of subjects who overcome “set” in the 
water-jar test: 

70 of 100 male science majors 
60 of 100 female science majors 
40 of 100 male history majors 
50 of 100 female history majors 
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Can you specify an interaction, and how would you proceed to test it (the 
interaction) for statistical significance? (Note: this is not a small sample 
situation.) 

16.5. Suppose you are setting up an experiment for the express purpose of 
testing the interactive effect of two factors. This could, of course, call for an 
R by C factorial design, fixed constants model, with m independent cases per cell. 
If it were feasible to run 240 subjects, would you prefer a 3 x 4 or a 4 x 6. 
design (i.e., 3 levels for one factor and 4 for the other or 4 levels for one and 6 for 
the other) ? What are the statistical considerations involved in your choice ? 

16.6. Suppose a two-way classification calling for the fixed constants model. 
Would you prefer to have RC cases with measurement replication (thus leading 
to m scores per cell) or would you prefer mRC individuals (also leading to m 
scores per cell) ? Why ? 

16.7. In order to ascertain the effect of varying the color of the stimulus patch 
on critical flicker fusion you might plan to use 10 subjects, each measured 6 times 
under each of four color conditions: red, blue, yellow, and green (brightness 
controlled). Set up a schematic variance table, with sources, variance estimates, 
and numerical dfs. Indicate legitimate Fs (actual variance ratios, in symbols) 
that can be used to test hypotheses, and specify the generality of conclusions you 
would draw from possible significant Fs. 

16.8. Suppose you are consulted by a statistically naive individual who has 
the notion that he can plan a study having to do with the effects of height and 
weight on basal metabolism by using a factorial design with height and weight 
as the basis for a two-way classification. 

a. Any warnings to him regarding possible difficulties in such a design ? 

b. What plan would you suggest instead? 

16.9. Suppose researchers A and B both start with 12 litters of rats, 4 rats per 
litter. Both use identical T mazes. Researcher A splits each litter randomly so 
as to have four groups which are run under four different degrees of food depriva¬ 
tion. To test the between groups differences (deprivation effects), A computes an 
F with an interaction variance as “error.” Researcher B runs all his rats under 
one condition, then calculates an F as the between litters variance estimate 
over the within litters variance estimate. (Each rat in both experiments has just 
one score.) 

a. Specify the degrees of freedom for A’s F and for B’s F. 

b. Why did A use an interaction variance estimate as “error” whereas B 
used a within variance estimate as his “error” term? 

16.10. In a recent Journal of Psychology will be found the results, Table X, for 
an experiment designed to learn whether seeing the movie “Gentleman’s Agree¬ 
ment” leads to changes in attitude toward the Jews. The researcher used the 
Levinson-Sanford Questionnaire on Anti-Semitism, an instrument with a re¬ 
ported reliability of .98. As usual in such studies, an experimental group was 
shown the movie, and a control group was not. Both groups were pretested with 
the questionnaire, and after the experimentals had seen the movie both groups 
were retested. The Ns were 50 and 90. 
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Table X. Summary statistics 

Experimental Control 


Mean, pretest 23.55, S M — 2 - 87 

Mean, posttest 16.40, S M =4.10 

Test-retest r = .64 
Difference in means: 7.15 
$Dm " 3.16, z = 2.26 


26.54, S M = 2 - 26 
27.06, S M = 2 - 79 

Test-retest r = .84 
Difference in means: .52 
S Dje = 1.63, ^ = .33 


It was concluded that because the experimentals show a significant (at P = .03 
level) difference, whereas the controls do not, the movie did lead to changes 
in attitude. Do you see anything wrong with his statistical treatment? 

X6.ll. Because a smaller Fis needed for a prescribed level of significance as the 
df for the denominator becomes larger, it has been argued frequently that we 
should strive to increase this df. 

a. Consider Design A with, say, 20 cases in each of two experimental 
groups having been assigned randomly and independently to the 
two groups in contrast to Design B in which we also have 20 cases 
per group, but the groups have been matched on a thought-to-be 
relevant variable by setting up 20 pairs of individuals. Which design 
provides the greater df and why might it be unwise to use the design 
with the larger dp. 

b. Consider the test for the significance of linear correlation. As we 
outlined the procedure, the dffov the denominator variance (residual) 
was N — 2. Now in some texts we can find that the denominator 
variance is taken as the within array variance about the array mean 
with df = N-G where G is the number of arrays. Obviously, the 
dfs differ according to which variance is being used as “error.” Why 
might the test based on N — 2 dfb& no more apt to lead to significant 
Fs than the one based on N - G dp 

c. Part b involves a within array variance about the array means and a 
within array variance about a regression line. Why can’t we test the 
difference between these two variances by taking Fas their ratio? 
Do you see a possible indirect method for making an inference 
regarding their difference? 

16.12. Consider a two-way layout for the scores of 30 persons on C = 3 forms 

of a test. . 

a. The remainder term will provide an estimate of error of measurement 
variance with how many (numerical value, please) degrees of freedom ? 

b. If your statistical clerk regarded the data as simply 30 groups of 
scores with m = 3 per group, he (or she) would have a possibly 
different estimate of error of measurement variance in the within 
persons term, which would have (numerical) how many degrees of 
freedom? 
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c. The possibly different” above implies what possible major source 

of difference, aside from dfs, in these two ways of estimating error 
variance ? & 

d. How would you test the difference between form means ? 

e ‘ How would you test the significance of the difference between the 
three form standard deviations? (Note: the available method may 
not be entirely satisfactory for a reason which you might specify.) 

16 ‘ 13 * La y out a series of possible plans for studying the effect of illumination 
and the effect of foveal versus peripheral vision on critical flicker fusion (CFF). 
Let us agree to use five levels of illumination and four areas of retinal stimulation 
(foveal plus three areas proceeding outward from the fovea toward the periphery). 
For each approach, indicate the sources of variation, the df% the variance 
estimates, and appropriate Fs. Evaluate the relative merits of your several plans. 
(Note: CFF is not affected by practice and can be measured in a couple of 
minutes.) r 

16.14. In an issue of the Journal of Consulting Psychology will be found a 
study of two groups (paranoid schizophrenics and normals), 27 cases per group. 
We are told that “the groups were equated for age, education, and intelligence.” 
The dependent variable is a measure of “distortion” of responses to four stories. 
The authors present an analysis of variance table, given here as Table XI. 


Table XI. Analysis of variance 


Source 

<V 

SSqs 

Var. Est. 

F 

P 

Groups 

1 

13.74 

13.74 

19.35 

<.01 

Stories 

3 

17.81 

5.94 

8.37 

<.01 

Individuals 

53 

144.87 

2.73 

3.85 

<.01 

G by S interaction 

3 

4.56 

1.52 

2.14 

>.05 

Error 

144 

101.73 

.71 


Total 

204 

282.71 





What, if anything, is wrong with their statistical analysis? 

16.15. In practically all applications of the F test the hypothesis being tested 
determines the variance estimate to be placed in the numerator for the F ratio, 
and usually we do not bother to compute F if this numerator variance is smaller 
than the variance chosen as “error” for the denominator. Occasionally the 
numeiator variance is so small relative to the chosen “error” variance that F 
taken upside down is significant. In discussing this, another text says “The 
situations where the F obtained in this manner is significant probably have no 
reasonable interpretation other than that they are the occasionally significant 
values which are to be expected in random sampling.” Any comment? 

16.16. A significant upside down F has its counterpart in what kind of a chi 
square ? 
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16.17. Do you think it possible, in a three-way fixed effects design, to have 
(a) three-way interaction without two-way interaction and (b) vice versa? 
Explain. (The use of diagrams may help here.) 

16.18. It has been suggested that when an interaction is significant (fixed 
constants), we should proceed to a series of analyses of variance one lower in 
order. That is, a significant R x B x C interaction should be followed by, 
say, B two-way analyses involving rows and columns; and a significant R x C 
interaction should lead to, say R one-way analyses. What possible sense can you 
make of this suggestion ? 

16.19. The author of a recent letter criticizes this textbook for advocating, 
without qualifications, the use of, e.g., s 2 rc for testing column effects in an 
[a r A b A c ] design. He says that s 2 rc should be used only when it is larger (not 
necessarily significantly larger) than s 2 rbc . Stated differently he says that if s 2 re is 
smaller than s 2 rbc the latter should be used as “error.” Obviously, his worry is 
about those situations for which 5 2 rc and s 2 rbc do not differ significantly. He 
mentions “pooling” after making the point that “one could use either term,” i.e., 
either s 2 rc or s 2 rbc as “error.” Aside from “pooling,” and in rather simple 
commonsense (but not statistically naive) terms, 

a. what argument do you think he set forth in favor of using s 2 re only 
when it is larger (though not significantly so) than s 2 rbc as “error” 
for FI 

b. What argument would you use against his proposal? Note that this 
part can be answered even though you cannot answer part a. 

16.20. Do you see any possible way of utilizing the analysis of variance tech¬ 
nique for testing the hypothesis that the correlation between two variables is 
perfect within limits imposed by errors of measurement? This is the correction 
for attenuation problem under another guise. We intend this to be a hard 
question, and the only hint is to think in terms of standard scores for the two 
variables being correlated. 

CHAPTER 17 

17.1. Occasionally, linear and quadratic trend analysis has been used in 
situations involving an independent “variable” that is actually qualitative but for 
which the investigator argues that in terms of the expected outcome for the 
dependent variable the “levels” on the independent variable can be ordered. 
Accordingly the qualitatively different levels are treated as though on a scale with 
equal spacings (distances) from level to level as a basis for trend analysis. Do you 
see any difficulty in this procedure ? 

17.2. In a recently published textbook is found an example in which five 
subjects are measured on three successive trials. The breakdown of the total sum 
of squares quite properly leads to sums of squares for trials (df = 2), for subjects 
( df = 4), and for subject by trial interaction (df = 8). The sum of squares for 
trials is 90.00, and the sum of squares for-linear trend is also 90.00 (df = 1). 

a. What implication does this equality of the two sums of squares have 
for a possible quadratic component? 
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b. The author failed to follow through with any implication regarding 
the consequence of having unequal dfi (2 and 1) for the trial and the 
linear component sums of squares (of the same amount). What 
helpful remark could he have made ? 

17.3. Consider the situation for which we have just three levels and linear and 
quadratic components have been “taken out.” What can be said, before you see 
the results, about the fit of a quadratic equation (curve) to the three means? 
Does it follow that the quadratic component must be statistically significant? 

17.4. No use of individual slopes was made prior to p. 354 of the text. How 
could we use individual slopes in the setup involving a single linear trend based 
on correlated observations ? 

17.5. Suppose the r between Y and X is .32 and the correlation ratio (eta) for 
y on A' is .40, both computed for a sample of N = 100 with G = 14 intervals on 
the x axis. The F for testing eta will have 13 and 86 degrees of freedom, whereas 
the F for testing r will have 1 and 98 degrees of freedom. When the Fs are com¬ 
puted and the significance levels determined, we have F eta = 1.26, P about .30, 
and F r = 13.76, P about .001. How do you account for r, though smaller than 
eta, being the more significant? (Do not forget that the larger the n x in the F 
table, the greater the significance for Fs of the same size.) 

17.6. When we have the Case XV setup, we can test the differences among the 
possible B linear trends by following the procedure given on pp. 354-355. 
Suppose we wish to test the differences between the quadratic components of the 
B trend lines. How would you do this? (Note: You might first take out the 
linear components or you might proceed directly to the quadratic part.) 

CHAPTER 18 

18.1. Line 5 of Table 18.2 indicates the possibility of computing an r based on 
between sums. Why would such an r be meaningless when G = 2? 

18.2. Suppose an appreciable negative correlation between X and an uncon¬ 
trolled variable Y. Defend either the proposition that the covariance adjustment 
can be used or the proposition that it cannot be used. 

18.3. The correlation between two variables based on combined sex groups will 
be distorted by possible sex differences on either or both variables. How might 
you proceed to obtain a single r that is not distorted by the sex difference? 

18.4. We failed to mention the problem of variance homogeneity when dis¬ 
cussing the covariance adjustment technique. Specifically, what variances are 
assumed to be homogeneous? Your reasoning? 

18.5. An assumption underlying the covariance adjustment technique is homo¬ 
geneity of regression from group to group. Does the text provide a method for 
testing this assumption ? Where (or what test) ? 

CHAPTER 19 

19.1. Suppose you have 22 cases measured under normal (control) conditions, 
then measured under a prescribed experimental condition. You are interested 
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in evaluating the changes, but because of marked skewness (in which scores ?) 
you are skeptical of the tenability of the t test. Do you see a way of testing the 
significance of the changes by a method for which you might use chi square as an 
approximation ? 

19.2. A possible extension of the general idea of the median test for two or more 
groups would be to classify the scores of each group into four categories accord¬ 
ing to their position relative to <2 3 , Q 2 , and Q x based on the combined groups. 
This would lead to a 4 by k table when we have k groups, from which a chi 
square with 3(k - \)df could be computed. Aside from difficulties when samples 
are very small and the loss of efficiency caused by grouping, do you see any 
possible problem in connection with the meaning of a significant chi square from 
such a setup ? 

19.3. While reviewing a manuscript submitted to Psychological Monographs , 
the author encountered a two-way fixed effects design in which rows stood for 
eight different “treatments” and columns stood for two groups (pilots and non¬ 
pilots), with 16 (independent) cases per cell. Apparently the writer of the 
manuscript was a devotee of nonparametric methods; instead of testing the T 
by G interaction by the conventional Ftest with the within cells as the error term, 
he used Kendall’s tau. If, his argument goes, the tau for the eight sets of 
means is significantly negative he would conclude that the interaction is significant. 
This tau is, of course, the correlation between a rank-ordering of the means in the 
first column and a rank-ordering of the means in the second column, with n = 8 
pairs of ranks. For n = 8, tau must reach (negative) .49 for significance at the 
chosen level Now there are three distinctly different bugs in this, every one of 
which nullifies his procedure. OK, try your critical powers. 

CHAPTER 20 

20.1. When experimental and control groups are set up on the basis of in¬ 
dividuals paired on two control variables, the gain in precision (or the error 
reduction) depends upon what fact(s) presumably available before carrying out 
the experiment ? 

20.2. Suppose you wish to do an experiment in which the cost per experimental 
subject is far greater than that for a control. Accordingly you decide to take 
Nq as 4 N e . What scheme, other than randomization, would you use to assure 
comparability for the two groups? And how would you proceed to test for 
significance the difference between the means of the two groups ? 

20.3. In a recent study of sex differences in problem solving (X), the fact of 
differences in general intelligence as measured by a college aptitude test ( Y ) was 
taken into account by comparing males and females who had been paired on Y. 

a. What statistical procedure do you think was used in testing the null 
hypothesis of no sex difference on XI 

b. Can you suggest an alternative experimental-statistical plan for 
getting at sex difference on X with Y controlled ? 

20.4. For the large sample situation, the sampling variance of a mean based on a 
sample stratified on variable U is given by S 2 x = S\ (1 - r 2 xu )jN and for the 
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sampling variance of the difference between means based on groups matched as 
to distribution on control variable Y (not individual pairing), we have 

= S 2 X1 ( 1 - + 5^(1 - r 2 xy2 )/N 2 

Perhaps you will have noted the similarity of sampling variance for the stratified 
sampling situation and the matched distributions situation. Do you see a con¬ 
nection between the foregoing formulas and the analysis of variance technique ? 
20.5. When attempting to evaluate the relative effect of two movies on attitudes 
(measured in a continuous fashion), we may form two groups by random assign¬ 
ment of individuals and then we may follow either (a) the procedure of a pretest, 
show movie, posttest, with the statistical analysis based on a comparison of the 
two mean changes or ( b ) the “after only” plan in which one movie is shown to 
each group after which both groups are tested and the difference between the 
resulting “after” means is tested for significance. This second, or “after only,” 
method is frequently more feasible than the first procedure. In general, which 
design would you expect to be more precise ? Why ? Can you specify a condition 
which might make the other design more precise? (Hint: presume that all four 
possible standard deviations are equal.) 
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Table A. Normal curve functions ( continued ) 


2 or xja 

Area: m to z 

Area: q Smaller 

y or Ordinate 

1.75 

.45994 


.0863 

1.80 

.46407 


.0790 

1.85 

.46784 


.0721 

1.90 

.47128 


.0656 

1.95 

.47441 

.02559 

.0596 

2.00 

.47725 

.02275 

.0540 

2.05 

.47982 


.0488 

2.10 

.48214 

.01786 

.0440 

2.15 

.48422 


.0396 

2.20 

.48610 

.01390 

.0355 

2.25 

.48778 

.01222 


2.30 

.48928 

.01072 

.0283 

2.35 

.49061 

.00939 

.0252 

2.40 

.49180 

.00820 

.0224 

2.45 

.49286 

.00714 

.0198 

2.50 

.49379 

.00621 

.0175 

2.55 

.49461 

.00539 

.0154 

2.60 

.49534 

.00466 

.0136 

2.65 

.49598 

.00402 

.0119 

2.70 

.49653 

.00347 

.0104 

2.75 

.49702 

.00298 

.0091 

2.80 

.49744 

.00256 

.0079 

2.85 

.49781 

.00219 

.0069 

2.90 

.49813 

.00187 

.0060 

2.95 

.49841 

.00159 

.0051 

3.00 

.49865 

.00135 


3.25 

.49942 

.00058 


3.50 

.49977 

.00023 


3.75 

.49991 



4.00 

.49997 


.0001 
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Table B. Transformation of r to £ 


r 

2 

r 

z 

r 

2 

.01 

.010 

.34 

.354 

.67 

.811 

.02 

.020 

.35 

.366 

.68 

.829 

.03 

.030 

.36 

.377 

.69 

.848 

.04 

.040 

.37 

.389 

.70 

.867 

.05 

.050 

.38 

.400 

.71 

.887 

.06 

.060 

.39 

.412 

.72 

.908 

.07 

.070 

.40 

.424 

.73 

.929 

.08 

.080 

.41 

.436 

.74 

.950 

.09 

.090 

.42 

.448 

.75 

.973 

.10 

.100 

.43 

.460 

.76 

.996 

.11 

.110 

.44 

Ml 

.77 

1.020 

.12 

.121 

.45 

.485 

.78 

1.045 

.13 

.131 

.46 

,497 

.79 

1.071 

.14 

.141 

.47 

.510 

.80 

1.099 

.15 

.151 

,48 

.523 

.81 

1.127 

.16 

.161 

.49 

.536 

.82 

1.157 

.17 

.172 

.50 

.549 

.83 

1.188 

.18 

.181 

.51 

.563 

.84 

1.221 

.19 

.192 

.52 

.577 

.85 

1.256 

.20 

.203 

.53 

.590 

.86 

1.293 

.21 

.214 

.54 

.604 

.87 

1.333 

.22 

.224 

.55 

.618 

.88 

1.376 

.23 

.234 

.56 

.633 

.89 

1.422 

.24 

.245 

.57 

.648 

.90 

1.472 

.25 

.256 

.58 

.663 

.91 

1.528 

.26 

.266 

.59 

.678 

.92 

1.589 

.27 

.277 

.60 

.693 

.93 

1.658 

.28 

.288 

.61 

.709 

.94 

1.738 

.29 

.299 

.62 

.725 

.95 

1.832 

.30 

.309 

.63 

.741 

.96 

1.946 

.31 

.321 

.64 

.758 

.97 

2.092 

.32 

.332 

.65 

.775 

.98 

2.298 

.33 

.343 

.66 

.793 

.99 

2.647 
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Table C. Transformation of z to r* 


z 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.0000 

.0100 

.0200 

.0300 

.0400 

.0500 

.0599 

.0699 

.0798 

.0898 

.1 

.0997 

.1096 

.1194 

.1293 

.1391 

.1489 

.1586 

.1684 

.1781 

.1877 

.2 

.1974 

.2070 

.2165 

.2260 

.2355 

.2449 

.2543 

.2636 

.2729 

.2821 

.3 

.2913 

.3004 

.3095 

.3185 

.3275 

.3364 

.3452 

.3540 

.3627 

.3714 

.4 

.3800 

.3885 

.3969 

.4053 

.4136 

.4219 

.4301 

.4382 

.4462 

.4542 

.5 

.4621 

.4699 

.4777 

.4854 

.4930 

.5005 

.5080 

.5154 

.5227 

.5299 

.6 

.5370 

.5441 

.5511 

.5580 

.5649 

.5717 

.5784 

.5850 

.5915 

.5980 

.7 

.6044 

.6107 

.6169 

.6231 

.6291 

.6351 

.6411 

.6469 

.6527 

.6584 

.8 

.6640 

.6696 

.6751 

.6805 

.6858 

.6911 

.6963 

.7014 

.7064 

.7114 

.9 

.7163 

.7211 

.7259 

.7306 

.7352 

.7398 

.7443 

.7487 

.7531 

.7574 

1.0 

.7616 

.7658 

.7699 

.7739 

.7779 

.7818 

.7857 

.7895 

.7932 

.7969 

u 

.8005 

.8041 

.8076 

.8110 

.8144 

.8178 

.8210 

.8243 

.8275 

.8306 

1.2 

.8337 

.8367 

.8397 

.8426 

.8455 

.8483 

.8511 

,8538 

.8565 

.8591 

1.3 

.8617 

.8643 

.8668 

.8692 

.8717 

.8741 

.8764 

.8787 

.8810; 

.8832 

1.4 

.8854 

.8875 

.8896 

.8917 

.8937 

.8957 

.8977 

.8996 

.9015 

.9033 

1.5 

.9051 

.9069 

.9087 

.9104 

.9121 

.9138 

.91*54 

.9170 

.9186 

.9201 

1.6 

.9217 

.9232 

.9246 

.9261 

.9275 

.9289 

.9302 

.9316 

.9329 

.9341 

1.7 

.9354 

.9366 

.9379 

.9391 

.9402 

.9414 

.9425 

.9436 

.9447 

.9458 

1.8 

.9468 

.9478 

.9488 

.9498 

.9508 

.9518 

.9527 

.9536 

.9545 

.9554 

1.9 

.9562 

.9571 

.9579 

.9587 

.9595 

.9603 

.9611 

.9618 

.9626 

.9633 

2.0 

.9640 

.9647 

.9654 

.9661 

.9668 

.9674 

.9680 

.9686 

.9693 

.9699 

2.1 

.9704 

.9710 

.9716 

.9722 

.9727 

.9732 

.9738 

.9743 

.9748 

.9753 

2.2 

.9757 

.9762 

.9767 

.9771 

.9776 

.9780 

.9785 

.9789 

.9793 

.9797 

2.3 

.9801 

.9805 

.9809 

.9812 

.9816 

.9820 

.9823 

.9827 

.9830 

.9834 

2.4 

.9837 

.9840 

.9843 

.9846 

.9849 

.9852 

.9855 

.9858 

.9861 

.9864 

2.5 

.9866 

.9869 

.9871 

.9874 

.9876 

.9879 

.9881 

.9884 

.9886 

.9888 

2.6 

.9890 

.9892 

.9894 

.9897 

.9899 

.9901 

.9903 

.9904 

.9906 

.9908 

2.7 

.9910 

.9912 

.9914 

.9915 

.9917 

.9919 

.9920 

.9922 

.9923 

.9925 

2.8 

.9926 

.9928 

.9929 

.9931 

.9932 

.9933 

.9935 

.9936 

.9937 

.9938 

2.9 

.9940 

.9941 

.9942 

.9943 

.9944 

.9945 

.9946 

.9948 

.9948 

.9950 


* Table C is abridged from Table VII of Fisher and Yates: Statistical tables for bio¬ 
logical, agricultural and medical research , Oliver and Boyd, Ltd., Edinburgh, by permis¬ 
sion of the authors and publishers. 
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n 

P = .99 

.98 

.95 

1 

.00016 

.00063 

.0039 

2 

.02 

.04 

.10 

3 

.12 

.18 

.35 

4 

.30 

.43 

.71 

5 

.55 

.75 

1.14 

6 

.87 

1.13 

1.64 

7 

1.24 

1.56 

2.17 

8 

1.65 

2.03 

2.73 

9 

2.09 

2.53 

3.32 

10 

2.56 

3.06 

3.94 

11 

3.05 

3.61 

4.58 

12 

3.57 

4.18 

5.23 

13 

4.11 

4.76 

5.89 

14 

4.66 

5.37 

6.57 

15 

5.23 

5.98 

7.26 

16 

5.81 

6.61 

7.96 

17 

6.41 

7.26 

8.67 

18 

7.02 

7.91 

9.39 

19 

7.63 

8.57 

10.12 

20 

8.26 

9.24 

10.85 

21 

8.90 

9.92 

11.59 

22 

9.54 

10.60 

12.34 

23 

10.20 

11.29 

13.09 

24 

10.86 

11.99 

13.85 

25 

11.52 

12.70 

14.61 

26 

12.20 

13.41 

15.38 

27 

12.88 

14.12 

16.15 

28 

13.56 

14.85 

16.93 

29 

14.26 

15.57 

17.71 

30 

14.95 

16.31 

18.49 


* Table D is abridged from Table IV of 
logical, agricultural and medical research, ( 


sion of the authors and publishers. 
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.90 

.80 

.70 

.50 

.016 

.064 

.15 

.46 

.21 

.45 

.71 

1.39 

.58 

1.00 

1.42 

2.37 

1.06 

1.65 

2.20 

3.36 

1.61 

2.34 

3.00 

4.35 

2.20 

3.07 

3.83 

5.35 

2.83 

3.82 

4.67 

6.35 

3.49 

4.59 

5.53 

7.34 

4.17 

5.38 

6.39 

8.34 

4.86 

6.18 

7.27 

9.34 

5.58 

6.99 

8.15 

10.34 

6.30 

7.81 

9.03 

11.34 

7.04 

8.63 

9.93 

12.34 

7.79 

9.47 

10.82 

13.34 

8.55 

10.31 

11.72 

14.34 

9.31 

11.15 

12.62 

15.34 

10.08 

12.00 

13.53 

16.34 

10.86 

12.86 

14.44 

17.34 

11.65 

13.72 

15.35 

18.34 

12.44 

14.58 

16.27 

19.34 

13.24 

15.44 

17.18 

20.34 

14.04 

16.31 

18.10 

21.34 

14.85 

17.19 

19.02 

22.34 

15.66 

18.06 

19.94 

23.34 

16.47 

18.94 

20.87 

24.34 

17,29 

19.82 

21.79 

25.34 

18.11 

20.70 

22.72 

26.34 

18.94 

21.59 

23.65 

27.34 

19.77 

22.48 

24.58 

28.34 

20.60 

23.36 

25.51 

29.34 


ates: Statistical tables for bio- 
d, Ltd., Edinburgh, by permis- 



APPENDIX 


429 


n 

.30 

Table D. Distribution of x 2 * ( continued ) 
.20 .10 .05 .02 

.01 

.001 

1 

1.07 

1.64 

2.71 

3.84 

5.41 

6.64 

10.83 

2 

2.41 

3.22 

4.60 

5.99 

7.82 

9.21 

13.82 

3 

3.66 

4.64 

6.25 

7.82 

9.84 

11.34 

16.27 

4 

4.88 

5.99 

7.78 

9.49 

11.67 

13.28 

18.46 

5 

6.06 

7.29 

9.24 

11.07 

13.39 

15.09 

20.52 

6 

7.23 

8.56 

10.64 

12.59 

15.03 

16.81 

22.46 

7 

8.38 

9.80 

12.02 

14.07 

16.62 

18.48 

24.32 

8 

9.52 

11.03 

13.36 

15.51 

18.17 

20.09 

26.12 

9 

10.66 

12.24 

14.68 

16.92 

19.68 

21.67 

27.88 

10 

11.78 

13.44 

15.99 

18.31 

21.16 

23.21 

29.59 

11 

12.90 

14.63 

17.28 

19.68 

22.62 

24.72 

31.26 

12 

14.01 

15.81 

18.55 

21.03 

24.05 

26.22 

32,91 

13 

15.12 

16.98 

19.81 

22.36 

25.47 

27.69 

34.53 

14 

16.22 

18.15 

21.06 

23.68 

26.87 

29.14 

36.12 

15 

17.32 

19.31 

22.31 

25.00 

28.26 

30.58 

37.70 

16 

18.42 

20.46 

23.54 

26.30 

29.63 

32.00 

39.25 

17 

19.51 

21.62 

24.77 

27.59 

31.00 

33.41 

40.79 

18 

20.60 

22.76 

25.99 

28.87 

32.35 

34.80 

42.31 

19 

21.69 

23.90 

27.20 

30.14 

33.69 

36.19 

43.82 

20 

22.78 

25.04 

28.41 

31.41 

35.02 

37.57 

45.32 

21 

23.86 

26.17 

29.62 

32.67 

36.34 

38.93 

46.80 

22 

24.94 

27.30 

30.81 

33.92 

37.66 

40.29 

48.27 

23 

26.02 

28.43 

32.01 

35.17 

38.97 

41.64 

49.73 

24 

27.10 

29.55 

33.20 

36.42 

40.27 

42.98 

51.18 

25 

28.17 

30.68 

34.38 

37.65 

41.57 

44.31 

52.62 

26 

29.25 

31.80 

35.56 

38.88 

42.86 

45.64 

54.05 

27 

30.32 

32.91 

36.74 

40.11 

44.14 

46.96 

55.48 

28 

31.39 

34.03 

37.92 

41.34 

45.42 

48.28 

56.89 

29 

32.46 

35.14 

39.09 

42.56 

46.69 

49.59 

58.30 

30 

33.53 

36.25 

40.26 

43.77 

47.96 

50.89 

59.70 

4 

■ Table D is abridged from Table TV of Fisher and Yates: Statistical tables for bio- 

logical , agricultural and medical research , Oliver and Boyd, Ltd., Edinburgh, by permis 


sion of the authors and publishers. 





n 

p = .1 

.05 

1 

6.314 

12.706 

2 

2.920 

4.303 

3 

2.353 

3.182 

4 

2.132 

2.776 

5 

2.015 

2.571 

6 

1.943 

2.447 

7 

1.895 

2.365 

8 

1.860 

2.306 

9 

1.833 

2.262 

10 

1.812 

2.228 

11 

1.796 

2.201 

12 

1.782 

2.179 

13 

1.771 

2.160 

14 

1.761 

2.145 

15 

1.753 

2.131 

16 

1.746 

2.120 

17 

1.740 

2.110 

18 

1.734 

2.101 

19 

1.729 

2.093 

20 

1.725 

2.086 

21 

1.721 

2.080 

22 

1.717 

2.074 

23 

1.714 

2.069 

24 

1.711 

2.064 

25 

1.708 

2.060 

26 

1.706 

2.056 

27 

1.703 

2.052 

28 

1.701 

2.048 

29 

1.699 

2.045 

30 

1.697 

2.042 

40 

1.684 

2.021 

60 

1.671 

2.000 

120 

1.658 

1.980 

co 

1.645 

1.960 


PSYCHOLOGICAL STATISTICS 

Table E. Distribution of t * 

•05 .02 

12.706 31.821 

4.303 6.965 

3.182 4.541 

2.776 3.747 

2.571 3.365 


63.657 

9.925 

5.841 

4.604 

4.032 


636.619 

31.598 

12.941 

8.610 

6.859 


* Table E is abridged from Table III of Fisher and Yates: Statistical tables for bio¬ 
logical, agricultural and medical research, Oliver and Boyd, Ltd., Edinburgh bv nermis- 
sion of the authors and publishers. 6 ’ y F 
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Table F. Table of F for .05 (roman), .01 {italic), and .001 (bold face) 
levels of significance* 


«1 

1 

2 

3 

4 

5 

6 

8 

12 

24 

00 


161 

200 

216 

225 

230 

234 

239 

244 

249 

254 

1 

4052 

4999 

5403 

5625 

5724 

5859 

5981 

6106 

6234 

6366 


405284 500000 540379 562500 576405 585937 598144 610667 623497 636619 


18.51 

19.00 

19.16 

19.25 

19.30 

19.33 

19.37 

19.41 

19.45 

19.50 

2 

98.49 

99.01 

99.17 

99.25 

99.30 

99.33 

99.36 

99.42 

99.46 

99.50 


998.5 

999.0 

999.2 

999.2 

999.3 

999.3 

999.4 

999.4 

999.5 

999.5 


10.13 

9.55 

9.28 

9.12 

9.01 

8.94 

8.84 

8.74 

8.64 

8.53 

3 

34.12 

30.81 

29.46 

28.71 

28.24 

27.91 

27.49 

27.05 

26.60 

26.72 


167.5 

148.5 

141.1 

137.1 

134.6 

132.8 

130.6 

128.3 

125.9 

123.5 


7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.04 

5.91 

5.77 

5.63 

4 

21.20 

18.00 

16.69 

15.98 

15.52 

15.21 

14.80 

14.37 

73.92 

73.46 


74.14 

61.25 

56.18 

53.44 

51.71 

50.53 

49.00 

47.41 

45.77 

44.05 


6.61 

5.79 

5.41 

5.19 

5.05 

4.95 

4.82 

4.68 

4.53 

4.36 

5 

16.26 

13.27 

12.06 

11.39 

10.97 

10.67 

10.27 

9.89 

9:47 

9.02 


47.04 

36.61 

33.20 

31.09 

29.75 

28.84 

27.64 

26.42 

25.14 

23.78 


5.99 

5.14 

4.76 

4.53 

4.39 

4.28 

4.15 

4.00 

3.84 

3.67 

6 

13.74 

10.92 

9.78 

9.15 

8.75 

8.47 

8.10 

7.72 

7.37 

6.88 


35.51 

27.00 

23.70 

21.90 

20.81 

20.03 

19.03 

17.99 

16.89 

15.75 


5.59 

4.74 

4.35 

4.12 

3.97 

3.87 

3.73 

3.57 

3.41 

3.23 

7 

12.25 

9.55 

8.45 

7.85 

7.46 

7.19 

6.84 

6.47 

6.07 

5.65 


29.22 

21.69 

18.77 

17.19 

16.21 

15.52 

14.63 

13.71 

12,73 

11.69 


5.32 

4.46 

4.07 

3.84 

3.69 

3.58 

3.44 

3.28 

3.12 

2.93 

8 

11.26 

8.65 

7.59 

7.01 

6.63 

6.37 

6.03 

5.67 

5128 

4.86 


25.42 

18.49 

15.83 

14.39 

13.49 

12.86 

12.04 

11.19 

10.30 

9.34 


5.12 

4.26 

3.86 

3.63 

3.48 

3.37 

3.23 

3.07 

2.90 

2.71 

9 

10.56 

8.02 

6.99 

6.42 

6.06 

5.80 

5.47 

5.11 

473 

4.37 


22.86 

16.39 

13.90 

12.56 

11.71 

11.13 

10.37 

9.57 

8.72 

7.81 


4.96 

4.10 

3.71 

3.48 

3.33 

3.22 

3.07 

2.91 

2.74 

2.54 

10 

10.04 

7.56 

6.55 

5.99 

5.64 

5.39 

5.06 

4.71 

4.33 

3.91 


21.04 

14.91 

12.55 

11.28 

10.48 

9.92 

9.20 

8.45 

7.64 

6.76 


4.84 

3.98 

3.59 

3.36 

3.20 

3.09 

2.95 

2.79 

2.61 

2.40 

11 

9.65 

7.20 

6.22 

5.67 

5.32 

5.07 

4.74 

4.40 

4.02 

3.60 


19.69 

13.81 

11.56 

10.35 

9.58 

9.05 

8.35 

7.63 

6.85 

6.00 


4.75 

3.88 

3.49 

3.26 

3.11 

3.00 

2.85 

2.69 

2.50 

2.30 

12 

9.33 

6.93 

5.95 

5.41 

5.06 

4.82 

4.50 

4.16 

3.78 

3.36 


18.64 

12.97 

10.80 

9.63 

8.89 

8.38 

7.71 

7.00 

6.25 

5.42 


* Table F is reprinted, in rearranged form, from Table V of Fisher and Yates: Statistical 
tables for biological, agricultural and medical research, Oliver and Boyd, Ltd., Edinburgh, by 
permission of the authors and publishers. 


432 


PSYCHOLOGICAL STATISTICS 


Table F. Table of F for .05 (roman), .01 {italic), and .001 (bold face) 
levels of significance* (< continued ) 



> 

2 

3 

4 

5 

6 

8 

12 

24 

OD 


4.67 

3.80 

3.41 

3.18 

3.02 

2.92 

2.77 

2.60 

2.42 

2.21 

13 

9.07 

6.70 

5.74 

5.20 

4.86 

4.62 

4.30 

3.96 

3.59 

3.76 


17.81 

12.31 

10.21 

9.07 

8.35 

7.86 

7.21 

6.52 

5.78 

4.97 


4.60 

3.74 

3.34 

3.11 

2.96 

2.85 

2.70 

2.53 

2.35 

2.13 

14 

8.86 

6.51 

5.56 

5.03 

4.69 

4.46 

4.74 

3.30 

3.43 

3.00 


17.14 

11.78 

9.73 

8.62 

7.92 

7.43 

6.80 

6.13 

5.41 

4.60 


4.54 

3.68 

3.29 

3.06 

2.90 

2.79 

2.64 

2.48 

2.29 

2.07 

15 

8.68 

6.36 

5.42 

4.89 

4.56 

4.32 

4.00 

3.67 

3.29 

2.57 


16.59 

11.34 

9.34 

8.25 

7.57 

7.09 

6.47 

5.81 

5.10 

4.31 


4.49 

3.63 

3.24 

3.01 

2.85 

2.74 

2.59 

2.42 

2.24 

2.01 

16 

8.53 

6.23 

5.29 

4.77 

4.44 

4.20 

3.39 

3.55 

3.73 

2.75 


16.12 

10.97 

9.00 

7.94 

7.27 

6.81 

6.19 

5.55 

4.85 

4.06 


4.45 

3.59 

3.20 

2.96 

2.81 

2.70 

2.55 

2.38 

2.19 

1.96 

17 

8.40 

6.11 

5.18 

4.67 

4.34 

4.10 

3.79 

3.45 

3.03 

2.65 


15.72 

10.66 

8.73 

7.68 

7.02 

6.56 

5.96 

5.32 

4.63 

3.85 


4.41 

3.55 

3.16 

2.93 

2.77 

2.66 

2.51 

2.34 

2.15 

1.92 

18 

8.28 

6.01 

5.09 

4.58 

4.25 

4.01 

3.77 

3.37 

3.00 

2.57 


15.38 

10.39 

8.49 

7.46 

6.81 

6.35 

5.76 

5.13 

4.45 

3.67 


4.38 

3.52 

3.13 

2.90 

2.74 

2.63 

2.48 

2.31 

2.11 

1.88 

19 

8.18 

5.93 

5.01 

4.50 

4.77 

2.94 

3.63 

3.30 

2.92 

2.49 


15.08 

10.16 

8.28 

7.26 

6.61 

6.18 

5.59 

4.97 

4.29 

3.52 


4.35 

3.49 

3.10 

2.87 

2.71 

2.60 

2.45 

2.28 

2.08 

1.84 

20 

8.10 

5.85 

4.94 

4.43 

4.10 

3.87 

3.56 

3.23 

2.36 

2.42 


14.82 

9.95 

8.10 

7.10 

6.46 

6.02 

5.44 

4.82 

4.15 

3.38 


4.32 

3.47 

3.07 

2.84 

2.68 

2.57 

2.42 

2.25 

2.05 

1.81 

21 

8.02 

5.78 

4.87 

4.37 

4.04 

3.57 

3.51 

3.77 

2.30 

2.36 


14.59 

9.77 

7.94 

6.95 

6.32 

5.88 

5.31 

4.70 

4.03 

3.26 


4.30 

3.44 

3.05 

2.82 

2.66 

2.55 

2.40 

2.23 

2.03 

1.78 

22 

7.94 

5.72 

4.82 

4.31 

3.99 

3.76 

3.45 

3.72 

2.75 

2.37 


14.38 

9.61 

7.80 

6.81 

6.19 

5.76 

5.19 

4.58 

3.92 

3.15 


4.28 

3.42 

3.03 

2.80 

2.64 

2.53 

2.38 

2.20 

2.00 

1.76 

23 

7.88 

5.66 

4.76 

4.26 

3.94 

3.77 

3.47 

3.07 

2.70 

2.26 


14.19 

9.47 

7.67 

6.69 

6.08 

5.65 

5.09 

4.48 

3.82 

3.05 


4.26 

3.40 

3.01 

2.78 

2.62 

2.51 

2.36 

2.18 

1.98 

1.73 

24 

7.82 

5.61 

4.72 

4.22 

3.90 

3.67 

3.36 

3.03 

2.66 

2.27 


14.03 

9.34 

7.55 

6.59 

5.98 

5.55 

4.99 

4.39 

3.74 

2.97 


* Table F is reprinted, in rearranged form, from Table V of Fisher and Yates: Statistical 
tables for biological , agricultural and medical research , Oliver and Boyd, Ltd., Edinburgh, by 
permission of the authors and publishers. 
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Table F. Table of F for .05 (roman), .01 (italic), and .001 (bold face) 



25 


26 


27 


28 


1 

2 

3 

4 

5 

6 

8 

12 

24 

oo 

4.24 

7.77 

13,88 

3.38 

5.57 

9.22 

2.99 

4.68 

7.45 

2.76 

4.18 

6.49 

2.60 

5.56 

5.88 

2.49 

5.65 

5.46 

2.34 

5.52 

4.91 

' 2.16 
2.99 

4.31 

1.96 

2.62 

3.66 

1.71 

2.77 

2.89 

4.22 

7.22 

13.74 

3.37 

5.53 

9.12 

2.98 

4.64 

7.36 

2.74 

4.14 

6.41 

2.59 

3.82 

5.80 

2.47 

5.59 

5.38 

2.32 

3.29 

4.83 

2.15 

2.96 

4.24 

1.95 

2.58 

3.59 

1.69 

2.13 

2.82 

4.21 

7.68 

13.61 

3.35 

5.49 

9.02 

2.96 

4.60 

7.27 

2.73 

4.11 

6.33 

2.57 

5.75 

5.73 

2.46 

5.56 

5.31 

2.30 

3.26 

4.76 

2.13 

2.93 

4.17 

1.93 

2.55 

3.52 

1.67 

2.10 

2.75 

4.20 

7.64 

13.50 

3.34 

5.45 

8.93 

2.95 

4.57 

7.19 

2.71 

4.07 

6.25 

2.56 

5.75 

5.66 

2.44 

3.53 

5.24 

2.29 

3.23 

4.69 

2.12 

2.90 

4.11 

1.91 

2.52 

3.46 

1.65 

2.06 

2.70 


29 

4.18 

7.60 

13.39 

3.33 

5.42 

8.85 

2.93 

4.54 

7.12 

2.70 

4.04 

6.19 

2.54 

3.73 

5.59 

2.43 

3.50 

5.18 

2.28 

3.20 

4.64 

2.10 

2.87 

4.05 

1.90 

2.49 

3.41 

1.64 
2.03 

2.64 

30 

4.17 

7.56 

13.29 

3.32 

5.39 

8.77 

2.92 

4.51 

7.05 

2.69 

4.02 

6.12 

2.53 
3.70 

5.53 

2.42 

3.47 

5.12 

2.27 

3.77 

4.58 

2.09 

2.84 

4.00 

1.89 

2.47 

3.36 

1.62 

2.07 

2.59 

40 

4.08 

7.31 

12.61 

3.23 

5.75 

8.25 

2.84 

4.57 

6.60 

2.61 

3.83 

5.70 

2.45 

3.57 

5.13 

2.34 

3.29 

4.73 

2.18 

2.99 

4.21 

2.00 

2.66 

3.64 

1.79 

2.29 

3.01 

1.51 

1.80 

2.23 

60 

4 % 00 
7.08 

11.97 

3.15 

4.98 

7.76 

2.76 

4.73 

6.17 

2.52 

3.65 

5.31 

2.37 

3.34 

4.76 

2.25 

3.72 

4.37 

2.10 

2.82 

3.87 

1.92 

2.50 

3.31 

1.70 

2.72 

2.69 

1.39 

7.60 

1.90 

120 

3 . 92 , 

6 . 85 J 

11 . 38 ^ 

3.07 

4.79 

7.31 

2.68 

3.95 

5.79 

2.45 

3.48 

4.95 

2.29 

3.77 

4.42 

2.17 

2.96 

4.04 

2.02 

2.66 

3.55 

1.83 

2.34 

3.02 

1.61 

7.95 

2.40 

1.25 

1.38 

1.56 

00 

3.84 

6.64 

10.83 

2.99 

4.60 

6.91 

2.60 

3.78 

5.42 

2.37 

3.32 

4.62 

2.21 

3.02 

4.10 

2.09 

2.80 

3.74 

1.94 

2.57 

3.27 

1.75 

2.18 

2.74 

1.52 

7.79 

2.13 

1.00 

7.00 

1.00 


permission of the authors and publishers. 
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Table G. Squares and square roots 


1.00000 3.16228 


in ? *22 1*00499 3.17805 

iS ? }- 6404 1*00995 3.19374 

I . 03 1.0609 1.01489 3.20936 

1-04 1.0816 1.01980 3.22490 

106 f * 02470 3.24037 

1*06 l * 12 o 6 1.02956 3.25576 

1 OS l’ltS J - 03441 3.27109 I 

J. 09 8 'dlli , LSI iJSgf 


1.10 

1.2100 

1.04881 

J 3.31662 

1.11 
M 2 : 
1.13 

1.2321 

1.2544 

1.2769 

1.05357 

1.05830 

1.06301 

3.33167 

3.34664 

3.36155 

1.14 

1.15 

1.16 

1.2996 

1 1.3225 
1.3456 

1.06771 

1.07238 

1.07703 

3.37639 

3.39116 

3.40588 

1.17 

1.18 
1.19 

1.3689 

1.3924 
1.4161 

1.08167 

1.08628 

1.09087 

3.42053 1 
3.43511 
3.44964 j 

1.20 

1.4400 

j 1.09545 

3.46410 


N 

N 2 

VN 

vTon 

1.50 

2.2500 

1.22474 

3.87298 

1.51 

1.52 

1.53 

2.2801 

2.3104 

2.3409 

1.22882 

1.23288 

1.23693 

3.88587 

3.89872 

3.91152 

1.54 

1.55 

1.56 

2.3716 

2.4025 

2.4336 

1.24097 

1.24499 

1.24900 

3.92428 

3.93700 

3.94968 

1.57 

1.58 

1.59 

2.4649 

2.4964 

2.5281 

1.25300 

1.25698 

1.26095 

3.96232 

3.97492 

3.98748 

1.60 

2.5600 

1.26491 

4.00000 

1.61 

1.62 

1.63 

2.5921 

2.6244 

2.6569 

1-26886 

1.27279 

1.27671 

4.01248 

4.02492 

4.03733 


1.28062 

1.28452 

1.28841 


4.04969 

4.06202 

4.07431 


}-29228 4.08656 

9 1 29615 4*09878 

2.8561 1.30000 4.11096 


1.10000 

1.10454 

1.10905 


3.47851 

3.49285 

3.50714 


1.30767 

1.31149 

1.31529 


1.11355 ! 3.52136 
1.11803 3.53553 

1.12250 3.54965 

1.12694 3.56371 i 

1.13137 3.57771 

1.13578 3.59166 


4.13521 

4.14729 

4.15933 


1.31909 4.17133 

1.32288 4.18330 

1.32665 4.19524 

1.33041 4.20714 

1.33417 4.21900 
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Table G. Squares and square roots ( continued) 


N 

N * 

Vn 

' VioT 


N 

N * 

Vn 

V ION 

2.00 

4.0000 

1.41421 

4.47214 


2.50 

6.2500 

1.58114 

5.00000 

2.01 

4.0401 

1.41774 

4.48330 


2.51 

6.3001 

1.58430 

5.00999 

2.02 

4.0804 

1.42127 

4.49444 


2.52 

6.3504 

1.58745 

5.01996 

2.03 

4.1209 

1.42478 

4.50555 


2.53 

6.4009 

1.59060 

5.02991 

2.04 

4.1616 

1.42829 

4.51664 


2.54 

6.4516 

1.59374 

5.03984 

2.05 

4.2025 

1.43178 

4.52769 


2.55 

6.5025 

1.59687 

5.04975 

2.06 

4.2436 

1.43527 

4.53872 


2.56 

6.5536 

1.60000 

5.05964 

2.07 

4.2849 

1.43875 

4.54973 


2.57 

6.6049 

1.60312 

5.06952 

2.08 

4.3264 

1.44222 

4.56070 


2.58 

6.6564 

1.60624 

5.07937 

2.09 

4.3681 

1.44568 

4.57165 


2.59 

6.7081 

1.60935 

5.08920 

2.10 

4.4100 

1.44914 

4.58258 


2.60 

6.7600 

1.61245 

5.09902 

2.11 

4.4521 

1.45258 

4.59347 


2.61 

6.8121 

1.61555 

5.10882 

2.12 

4.4944 

1.45602 

4.60435 


2.62 

6.8644 

1.61864 

5.11859 

2.13 

4.5369 

1.45945 

4.61519 


2.63 

6.9169 

1.62173 

5.12835 

2.14 

4.5796 

1.46287 

4.62601 


2.64 

6.9696 

1.62481 

5.13809 

2.15 

4.6225 

1.46629 

4.63681 


2.65 

7.0225 

1.62788 

5.14782 

2.16 

4.6656 

1.46969 

4.64758 


2.66 

7.0756 

1.63095 

5.15752 

2.17 

4.7089 

1.47309 

4.65833 


2.67 

7.1289 

1.63401 

5.16720 

2.18 

4.7524 

1.47648 

4.66905 


2.68 

7.1824 

1.63707 

5.17687 

2.19 

4.7961 

1.47986 

4.67974 


2.69 

7.2361 

1.64012 

5.18652 

2.20 

4.8400 

1.48324 

4.69042 


2.70 

7.2900 

1.64317 

5.19615 

2.21 

4.8841 

1.48661 

4.70106 


2.71 

7.3441 

1.64621 

5.20577 

2.22 

4.9284 

1.48997 

4.71169 


2.72 

7.3984 

1.64924 

5.21536 

2.23 

4.9729 

1.49332 

4.72229 


2.73 

7.4529 

1.65227 

5.22494 

2.24 

5.0176 

1.49666 

4.73286 


2.74 

7.5076 

1.65529 

5.23450 

2.25 

5.0625 

1.50000 

4.74342 


2.75 

7.5625 

1.65831 

5.24404 

2.26 

5.1076 ; 

1.50333 

4.75395 


2.76 

7.6176 

1.66132 

5.25357 

2.27 

5.1529 

1.50665 

4.76445 


2.77 

7.6729 

1.66433 

5.26308 

2.28 

5.1984 

1.50997 

4.77493 


2.78 

7.7284 

1.66733 

5.27257 

2.29 

5.2441 

1.51327 

4.78539 


2.79 

7.7841 

1.67033 

5.28205 

2.30 

5.2900 

1 1.51658 

4.79583 


2.80 

7.8400 

1.67332 

5.29150 

2.31 

5.3361 

1.51987 

4.80625 


2.81 

7.8961 

1.67631 

5.30094 

2.32 

5.3824 

1.52315 

4.81664 


2.82 

7.9524 

1.67929 

5.31037 

2.33 

5.4289 

1.52643 

4.82701 


2.83 

8.0089 

1.68226 

5.31977 

2.34 

5.4756 

1.52971 

4.83735 


2.84 

8.0656 

1.68523 

5.32917 

2.35 

5.5225 

1.53297 

4.84768 


2.85 

8.1225 

1.68819 

5.33854 

2.36 

5.5696 

1.53623 

4.85798 


2.86 

8.1796 

1.69115 

5.34790 

2.37 

5.6169 

1.53948 

4.86826 


2.87 

8.2369 

1.69411 

5.35724 

2.38 

5.6644 

1.54272 

4.87852 


2.88 

8.2944 

1.69706 

5.36656 

2.39 

5.7121 

1.54596 

4.88876 


2.89 

8.3521 

1.70000 

5.37587 

2.40 

5.7600 

1.54919 

4.89898 


2.90 

8.4100 

1.70294 

5.38516 

2.41 

5.8081 

1.55242 

4.90918 


2.91 

8.4681 

1.70587 

5.39444 

2.42 

5.8564 

1.55563 

4.91935 


2.92 

8.5264 

1.70880 

5.40370 

2.43 

5.9049 

1.55885 

4.92950 


2.93 

8.5849 

1.71172 

5.41295 

2.44 

5.9536 

1.56205 

4.93964 


2.94 

8.6436 

1.71464 

5.42218 

2.45 

6.0025 

1.56525 

4.94975 


2.95 

8.7025 

1.71756 

5.43139 

2.46 

6.0516 

1.56844 

4.95984 


2.96 

8.7616 

1.72047 

5.44059 

2.47 

6.1009 

1.57162 

4.96991 


2.97 

8.8209 

1.72337 

5.44977 

2.48 

6.1504 

1.57480 

4.97996 


2.98 

8.8804 

1.72627 

5.45894 

2.49 

6.2001 

1.57797 

4.98999 


2.99 

8.9401 

1.72916 

5.46809 

2.50 

6.2500 

1.58114 

5.00000 


3.00 

9.0000 

1.73205 

5.47723 

N 

N 2 

VN 

vTon 


N 


VN 

Vion 
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Table G. Squares and square roots (< continued ) 


N 

W 

VN 

Vion 


N 

N * 

VN 

vTon 

3.00 

9.0000 

1.73205 

5.47723 


3.50 

12.2500 

1.87083 

5.91608 

3.01 

9.0601 

1.73494 

5.48635 


3.51 

12.3201 

1.87350 

5.92453 

3.02 

9.1204 

1.73781 

5.49545 


3.52 

12.3904 

1.87617 

5.93296 

3.03 

9.1809 

1.74069 

5.50454 


3.53 

12.4609 

1.87883 

5.94138 

3.04 

9.2416 

1.74356 

5.51362 


3.54 

12.5316 

1.88149 

5.94979 

3.05 

9.3025 

1.74642 

5.52268 


3.55 

12.6025 

1.88414 

5.95819 

3.06 

9.3636 

1.74929 

5.53173 


3.56 

12.6736 

1.88680 

5.96657 

3.07 

9.4249 

1.75214 

5.54076 


3.57 

12.7449 

1.88944 

5.97495 

3.08 

9.4864 

1.75499 

5.54977 


3.58 

12.8164 

1.89209 

5.98331 

3.09 

9.5481 

1.75784 

5.55878 


3.59 

12.8881 

1.89473 

5.99166 

3.10 

9.6100 

1.76068 

5.56776 


3.60 

12.9600 

1.89737 

6.00000 

3.11 

9.6721 

1.76352 

5.57674 


3.61 

13.0321 

1.90000 

6.00833 

3.12 

9.7344 

1.76635 

5.58570 


3.62 

13.1044 

1.90263 

6.01664 

3.13 

9.7969 

1.76918 

5.59464 


3.63 

13.1769 

1.90526 

6.02495 

3.14 

9.8596 

1.77200 

5.60357 


3.64 

13.2496 

1.90788 

6.03324 

3.15 

9.9225 

1.77482 

5.61249 


3.65 

13.3225 

1.91050 

6.04152 

3.16 

9.9856 

1.77764 

5.62139 


3.66 

13.3956 

1.91311 

6.04979 

3.17 

10.0489 

1.78045 

5.63028 


3.67 

13.4689 

1.91572 

6.05805 

3.18 

10.1124 

1.78326 

5.63915 


3.68 

13.5424 

1.91833 

6.06630 

3.19 

10.1761 

1.78606 

5.64801 


3.69 

13.6161 

1.92094 

6.07454 

3.20 

10.2400 

1.78885 

5.65685 


3.70 

13.6900 

1.92354 

6.08276 

3.21 

10.3041 

1.79165 

5.66569 


3.71 

13.7641 

1.92614 

6.09098 

3.22 

10.3684 

1.79444 

5.67450 


3.72 

13.8384 

1.92873 

6.09918 

3.23 

10.4329 

1.79722 

5.68331 


3.73 

13.9129 

1.93132 

6.10737 

3.24 

10.4976 

1.80000 

5.69210 


3.74 

13.9876 

1.93391 

6.11555 

3.25 

10.5625 

1.80278 

5.70088 


3.75 

14.0625 

1.93649 

6.12372 

3.26 

10.6276 

1.80555 

5.70964 


3.76 i 

14.1376 

1.93907 

6.13188 

3.27 i 

10.6929 

1.80831 

5.71839 


3.77 : 

14.2129 

1.94165 

6.14003 

3.28 

10.7584 

1.81108 

5.72713 


3.78 

14.2884 

1.94422 

6.14817 

3.29 

10.8241 

1.81384 

5.73585 


3.79 

14.3641 

1.94679 

6.15630 

3.30 

10.8900 

1.81659 

5.74456 


3.80 

14.4400 

1.94936 

6.16441 

3.31 

10.9561 

1.81934 

5.75326 


3.81 

14.5161 

1.95192 

6.17252 

3.32 

11.0224 

1.82209 

5.76194 


3.82 

14.5924 

1.95448 

6.18061 

3.33 

11.0889 

1.82483 

5.77062 


3.83 ; 

14.6689 

1.95704 

6.18870 

3.34 

11.1556 

1.82757 

5.77927 


3.84 : 

14.7456 

1.95959 

6.19677 

3.35 

11.2225 

1.83030 

5.78792 


3.85 

14.8225 

1.96214 

6.20484 

3.36 

11.2896 

1.83303 

5.79655 


3.86 ; 

14.8996 

1.96469 

6.21289 

3.37 

11.3569 

1.83576 

5.80517 


3.87 

14.9769 

1.96723 

6.22093 

3.38 

11.4244 

1.83848 

5.81378 


3.88 

15.0544 

1.96977 

6.22896 

3.39 

11.4921 

1.84120 

5.82237 


3.89 

15.1321 

1.97231 

6.23699 

3.40 

11.5600 

1.84391 

5.83095 


3.90 

15.2100 

1.97484 

6.24500 

3.41 

11.6281 

1.84662 

5.83952 


3.91 

15.2881 

1.97737 

6.25300 

3.42 

11.6964 

1.84932 

5.84808 


3.92 

15.3664 

1.97990 

6.26099 

3.43 

11.7649 

1.85203 

5.85662 


3.93 

15.4449 

1.98242 

6 . 2 6897 

3.44 

11.8336 

1.85472 

5.86515 


3.94 

15.5236 

1.98494 

6.27694 

3.45 

11.9025 

1.85742 

5.87367 


3.95 

15.6025 

1.98746 

6.28490 

3.46 

11.9716 

1.86011 

5.88218 


3.96 

15.6816 

1.98997 

6.29285 

3.47 

12.0409 

1.86279 

5.89067 


3.97 

15.7609 

1.99249 

6.30079 

3.48 

12.1104 

1.86548 

5.89915 


3.98 

15.8404 

1.99499 

6.30872 

3.49 

12.1801 

1.86815 

5.90762 


3.99 

15.9201 

1.99750 

6.31664 

3.50 

12.2500 

1.87083 

5.91608 


4.00 

16.0000 

2.00000 

6.32456 

IT 

N * 

VN 

VlON 


N 

m 

VN 

vTON 
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Table G. Squares and square roots (< continued ) 


N 

m 

Vn 

VTon 


N 

m 

VN 

VlON 

4.00 

16.0000 

2.00000 

6.32456 


4.50 

20.2500 

2.12132 

6.70820 

4.01 

16.0801 

2.00250 

6.33246 


4.51 

20.3401 

2.12368 

6./1565 

4.02 

16.1604 

2.00499 

6.34035 


4.52 

20.4304 

2.12603 

6.72309 

4.03 

16.2409 

2.00749 

6.34823 


4.53 

20.5209 

2.12838 

6.73053 

4.04 

16.3216 

2.00998 

6.35610 


4.54 

20.6116 

2.13073 

6.73795 

4.05 

16.4025 

2.01246 

6.36396 


4.55 

20.7025 

2.13307 

6.74537 

4.06 

16.4836 

2.01494 

6.37181 


4.56 

20.7936 

2.13542 

6.75278 

4.07 

16.5649 

2.01742 

6.37966 


4.57 

20.8849 

2.13776 

6.76018 

4.08 

16.6464 

2.01990 

6.38749 


4.58 

20.9764 

2.14009 

6.76757 

4.09 

16.7281 

2.02237 

6.39531 


4.59 

21.0681 

2.14243 

6.77495 

4.10 

16.8100 

2.02485 

6.40312 


4.60 

21.1600 

2.14476 

6.78233 

4.11 

16.8921 

2.02731 

6.41093 


4.61 

21.2521 

2.14709 

6.78970 

4.12 

16.9744 

2.02978 

6.41872 


4.62 

21.3444 

2.14942 

6.79706 

4.13 

17.0569 

2.03224 

6.42651 


4.63 

21.4369 

2.15174 

6.80441 

4.14 

17.1396 

2.03470 

6.43428 


4.64 

21.5296 

2.15407 

6.81175 

4.15 

17.2225 

2.03715 

6.44205 


4.65 

21.6225 

2.15639 

6.81909 

4.16 

17.3056 

2.03961 

6.44981 


4.66 

21.7156 

2.15870 

6.82642 

4.17 

17.3889 

2.04206 

6.45755 


4.67 

21.8089 

2.16102 

6.83374 

4.18 

17.4724 

2.04450 

6.46529 


4.68 

21.9024 

2.16333 

6.84105 

4.19 

17.5561 

2.04695 

6.47302 


4.69 

21.9961 

2.16564 

6.84836 

4.20 

17.6400 

2.04939 

6.48074 


4.70 

22.0900 

2.16795 

6.85565 

4.21 

17.7241 

2.05183 

6.48845 


4.71 

22.1841 

2.17025 

6.86294 

4.22 

17.8084 

2.05426 

6.49615 


4.72 

22.2784 

2.17256 

6.87023 

4.23 

17.8929 

2.05670 

6.50384 


4.73 

22.3729 

2.17486 

6.87750 

4.24 

17.9776 

2.05913 

6.51153 


4.74 

22.4676 

2.17715 

6.88477 

4.25 

18.0625 

2.06155 

6.51920 


4.75 

22.5625 

2.17945 

6.89202 

4.26 

18.1476 

2.06398 

6.52687 


4.76 

22.6576 

2.18174 

6.89928 

4.27 

18.2329 

2.06640 

6.53452 


4.77 

22.7529 

2.18403 

6.90652 

4.28 

18.3184 

2.06882 

6.54217 


4.78 

22.8484 

2.18632 

6.91375 

4.29 

18.4041 

2.07123 

6.54981 


4.79 

22.9441 

2.18861 

6.92098 

4.30 

18.4900 

2.07364 

6.55744 


4.80 

23.0400 

2.19089 

6.92820 

4.31 

18.5761 

2.07605 

6.56506 


4.81 

23.1361 

2.19317 

6.93542 

4.32 

18.6624 

2.07846 

6.57267 


4.82 

23.2324 

2.19545 

6.94262 

4.33 

18.7489 

2.08087 

6.58027 


4.83 

23.3289 

2.19773 

6.94982 

4.34 

18.8356 

2.08327 

6.58787 


4.84 

23.4256 

2.20000 

6.95701 

4.35 

18.9225 

2.08567 

6.59545 


4.85 

23.5225 

2.20227 

6.96419 

4.36 

19.0096 

2.08806 

6.60303 


4.86 

23.6196 

2.20454 

6.97137 

4,37 

19.0969 

2.09045 

6.61060 


4.87 

23.7169 

2.20681 

6.97854 

4.38 

19.1844 

2.09284 

6.61816 


4.88 

23.8144 

2.20907 

6.98570 

4.39 

19.2721 

2.09523 

6.62571 


4.89 

23.9121 

2.21133 

6.99285 

4.40 

19.3600 

2.09762 

6.63325 


4.90 

24.0100 

2.21359 

7.00000 

4.41 

19.4481 

2.10000 

6.64078 


4.91 

24.1081 

2.21585 

7.00714 

4.42 

19.5364 

2.10238 

6.64831 


4.92 

24.2064 

2.21811 

7.01427 

4.43 

19.6249 

2.10476 

6.65582 


4.93 

24.3049 

2.22036 

7.02140 

4.44 

19.7136 

2.10713 

6.66333 


4.94 

24.4036 

2.22261 

7.02851 

4.45 

19.8025 

2.10950 

6.67083 


4.95 

24.5025 

2.22486 

7.03562 

4.46 

19.8916 

2.11187 

6.67832 


4.96 

24.6016 

2.22711 

7.04273 

4.47 

19.9809 

2.11424 

6.68581 


4.97 

24.7009 

2.22935 

7.04982 

4.48 

20.0704 

2.11660 

6.69328 


4.98 

24.8004 

2.23159 

7.05691 

4.49 

20.1601 

2.11896 

6.70075 


4.99 

24.9001 

2.23383 

7.06399 

4.50 

20.2500 

2.12132 

6.70820 


5.00 

25.0000 

2.23607 

7.07107 

N 

N * 

Vn 

VlON 


N 

N * 

VN 

VlON 
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Table G. Squares and square roots ( continued ) 


N 

m 

VH 

VlON 

5.00 

25.0000 

2.23607 

7.07107 

5.01 

25.1001 

2.23830 

7.07814 

5.02 

25.2004 

2.24054 

7.08520 

5.03 

25.3009 

2.24277 

7.09225 

5.04 

25.4016 

2.24499 

7.09930 

5.05 

25.5025 

2.24722 

7.10634 

5.06 

25.6036 

2.24944 

7.11337 

5.07 

25.7049 

2.25167 

7.12039 

5.08 

25.8064 

2.25389 

7.12741 

5.09 

25.9081 

2.25610 

7.13442 

5.10 

26.0100 

2.25832 

7.14143 

5.11 

26.1121 

2.26053 

7.14843 

5.12 

26.2144 

2.26274 

7.15542 

5.13 

26.3169 

2.26495 

7.16240 

5.14 

26.4196 

2.26716 

7.16938 

5.15 

26.5225 

2.26936 

7.17635 

5.16 

26.6256 

2.27156 

7.18331 

5.17 

26.7289 

2.27376 

7.19027 

5.18 

26.8324 

2.27596 

7.19722 

5.19 

26.9361 

2.27816 

7.20417 

5.20 

27.0400 

2.28035 

7.21110 

5.21 

27.1441 

2.28254 

7.21803 

5.22 

27.2484 

2.28473 

?.22496 

5.23 

27.3529 

2.28692 

7.23187 

5.24 

27.4576 

2.28910 

7.23878 

5.25 

27.5625 

2.29129 

7.24569 

5.26 

27.6676 

2.29347 

7.25259 

5.27 

27.7729 

2.29565 

7.25948 

5.28 

27.8784 

2.29783 

7.26636 

5.29 

27.9841 

2.30000 

7.27324 

5.30 

28.0900 

2.30217 

7.28011 

5.31 

28.1961 

2.30434 

7.28697 

5.32 

28.3024 

2.30651 

7.29383 

5.33 

28.4089 

2.30868 

7.30068 

5.34 

28.5156 

2.31084 

7.30753 

5.35 

28.6225 

2.31301 

7.31437 

5.36 

28.7296 

2.31517 

7.32120 

5.37 

28.8369 

2.31733 

7.32803 

5.38 

28.9444 

2.31948 

7.33485 

5.39 

29.0521 

2.32164 

7.34166 

5.40 

29.1600 

2.32379 

7.34847 

5.41 

29.2681 

2.32594 

7.35527 

5.42 

29.3764 

2.32809 

7.36206 

5.43 

29.4849 

2.33024 

7.36885 

5.44 

29.5936 

2.33238 

7.37564 

5.45 

29.7025 

2.33452 

7.38241 

5.46 

29.8116 

2.33666 

7.38918 

5.47 

29.9209 

2.33880 

7.39594 

5.48 

30.0304 

2.34094 

7.40270 

5.49 

30.1401 

2.34307 

7.40945 

5.50 

30.2500 

2.34521 

7.41620 

N 

N * 

VN 

VlON 


N 

N * 

VN 

VlON 

5.50 

30.2500 

2.34521 

7.41620 

5.51 

30.3601 

2.34734 

7.42294 

5.52 

30.4704 

2.34947 

7.42967 

5.53 

30.5809 

2.35160 

7.43640 

5.54 

30.6916 

2.35372 

7.44312 

5.55 

30.8025 

2.35584 

7.44983 

5.56 

30.9136 

2.35797 

7.45654 

5.57 

31.0249 

2.36008 

7.46324 

5.58 

31.1364 

2.36220 

7.46994 

5.59 

31.2481 

2.36432 

7.47663 

5.60 

31.3600 

2.36643 

7.48331 

5.61 

31.4721 

2.36854 

7.48999 

5.62 

31.5844 

2.37065 

7.49667 

5.63 

31.6969 

2.37276 

7.50333 

5.64 

31.8096 

2.37487 

7.50999 

5.65 

31.9225 

2.37697 

7.51665 

5.66 

32.0356 

2.37908 

7.52330 

5.67 

32.1489 

2.38118 

7.52994 

5.68 

32.2624 

2.38328 

7.53658 

5.69 

32.3761 

2.38537 

7.54321 

5.70 

32.4900 

2.38747 

7.54983 

5.71 

32.6041 

2.38956 

7.55645 

5.72 

32.7184 

2.39165 

7.56307 

5.73 

32.8329 

2.39374 

7.56968 

5.74 

32.9476 

2.39583 

7.57628 

5.75 

33.0625 

2.39792 

7.58288 

5.76 

33.1776 

2.40000 

7.58947 

5.77 

33.2929 

2.40208 

7.59605 

5.78 

33.4084 

2.40416 

7.60263 

5.79 

33.5241 

2.40624 

7.60920 

5.80 

33.6400 

2.40832 

7.61577 

5.81 

33.7561 

2.41039 

7.62234 

5.82 

33.8724 

2.41247 

7.62889 

5.83 

33.9889 

2.41454 

7.63544 

5.84 

34.1056 

2.41661 

7.64199 

5.85 

34.2225 

2.41868 

7.64853 

5.86 

34.3396 

2.42074 

7.65506 

5.87 

34.4569 

2.42281 

7.66159 

5.88 

34.5744 

2.42487 

7.66812 

5.89 

34.6921 

2.42693 

7.67463 

5.90 

34.8100 

2.42899 

7.68115 

5.91 

34.9281 

2.43105 

7.68765 

5.92 

35.0464 

2.43311 

7.69415 

5.93 

35.1649 

2.43516 

7.70065 

5.94 

35.2836 

2.43721 

7.70714 

5.95 

35.4025 

2.43926 

7.71362 

5.96 

35.5216 

2.44131 

7.72010 

5.97 

35.6409 

2.44336 

7.72658 

5.98 

35.7604 

2.44540 

7.73305 

5.99 

35.8801 

2.44745 

7.73951 

6.00 

36.0000 

2.44949 

7.74597 

N 

N * 

VN 

VlON 
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Table G. Squares and square roots (< continued) 


N 


N 2 


7.00 49.0000 


7.01 49.1401 

7.02 49.2804 

7.03 49.4209 

7.04 49.5616 

7.05 49.7025 

7.06 49.8436 


7.07 49.9849 2.65895 

7.08 50.1264 2.66083 

7.09 50.2681 2.66271 


7.10 50.4100 2.66458 



2.65330 

2.65518 

2.65707 


7.11 

7.12 

7.13 


50.5521 

50.6944 

50.8369 


7.14 50.9796 

7.15 51.1225 

7.16 51.2656 


7.17 

7.18 

7.19 


51.4089 

51.5524 

51.6961 


2.66646 

2.66833 

2.67021 

2.67208 

2.67395 

2.67582 

2.67769 

2.67955 

2.68142 


7.20 I 51.8400 2.68328 


7.21 

7.22 

7.23 

7.24 

7.25 

7.26 

7.27 

7.28 

7.29 


51.9841 

52.1284 

52.2729 

52.4176 

52.5625 

52.7076 

52.8529 

52.9984 

53.1441 


7.30 I 53.2900 


7.31 

7.32 

7.33 


2.68514 

2.68701 

2.68887 


8.42615 

8.43208 

8.43801 

8.44393 

8.44985 

8.45577 

8.46168 

8.46759 

8.47349 

8.47939 

8.48528 


8.49117 

8.49706 

8.50294 


2.69072 8.50882 

2.69258 8.51469 

2.69444 8.52056 


2.69629 

2.69815 

2.70000 


2.70185 


7.34 

7.35 

7.36 

7.37 

7.38 

7.39 


53.4361 

53.5824 

53.7289 

53.8756 

54.0225 

54.1696 

54.3169 

54.4644 

54.6121 


7.40 54.7600 


7.41 

7.42 

7.43 

7.44 

7.45 

7.46 

7.47 

7.48 

7.49 


54.9081 

55.0564 

55.2049 

55.3536 

55.5025 

55.6516 

55.8009 

55.9504 

56.1001 


7.50 

N 


56.2500 


m 


2.70370 

2.70555 

2.70740 

2.70924 

2.71109 

2.71293 

2.71477 

2.71662 

2.71846 


8.52643 

8 . 53229 ' 

8.53815 


8.54400 


2.72029 


2.72213 

2.72397 

2.72580 

2.72764 

2.72947 

2.73130 

2.73313 

2.73496 

2.73679 


8.54985 

8.55570 

8.56154 

8.56738 

8.57321 

8.57904 

8.58487 

8.59069 

8.59651 


8.60233 


8.60814 

8.61394 

8.61974 

8.62554 

8.63134 

8.63713 

8.64292 

8.64870 

8.65448 


2.73861 


\/N 


8.66025 


VlON 


N 


7.50 


7.51 

7.52 

7.53 

7.54 

7.55 

7.56 

7.57 

7.58 

7.59 


7.60 


7.61 

7.62 

7.63 

7.64 

7.65 

7.66 

7.67 

7.68 

7.69 


7.70 


7.71 

7.72 

7.73 

7.74 

7.75 

7.76 

7.77 

7.78 

7.79 


N 2 


\/N 


56.2500 


56.4001 

56.5504 

56.7009 

56.8516 

57.0025 

57.1536 

57.3049 

57.4564 

57.6081 


57.7600 


2.73861 


2.74044 

2.74226 

2.74408 

2.74591 

2.74773 

2.74955 

2.75136 

2.75318 

2.75500 


V ION 


8.66025 


2.75681 


57.9121 

58.0644 

58.2169 

58.3696 

58.5225 

58.6756 

58.8289 

58.9824 

59.1361 


59,2900 


7.80 

~ 7 C 8 I 

7.82 

7.83 

7.84 

7.85 

7.86 

7.87 

7.88 

7.89 


7.90 


7.91 

7.92 

7.93 

7.94 

7.95 

7.96 

7.97 

7.98 

7.99 


8.00 


N 


59.4441 

59.5984 

59.7529 

59.9076 

60.0625 

60.2176 

60.3729 

60.5284 

60.6841 


60.8400 


60.9961 

61.1524 

61.3089 

61.4656 

61.6225 

61.7796 

61.9369 

62.0944 

62.2521 


62.4100 


62.5681 

62.7264 

62.8849 

63.0436 

63.2025 

63.3616 

63.5209 

63.6804 

63.8401 


2.75862 

2.76043 

2.76225 

2.76405 

2.76586 

2.76767 

2.76948 

2.77128 

2.77308 


8.66603 

8.67179 

8.67756 

8.68332 

8.68907 

8.69483 

8.70057 

8.70632 

8.71206 


8.71780 


8.72353 

8.72926 

8.73499 

8.74071 

8.74643 

8.75214 

8.75785 

8.76356 

8.76926 


2.77489 


2.77669 

2.77849 

2.78029 

2.78209 

2.78388 

2.78568 

2.78747 

2.78927 

2.79106 


8.77496 


8.78066 

8.78635 

8.79204 

8.79773 

8.80341 

8.80909 

8.81476 

8.82043 

8.82610 


2.79285 


2.79464 

2.79643 

2.79821 

2.80000 

2.80179 

2.80357 

2.80535 

2.80713 

2.80891 


2.81069 


8.83176 


8.83742 

8.84308 

8.84873 

8.85438 

8.86002 

8.86566 

8.87130 

8.87694 

8.88257 


.81247 

.81425 

81603 

.81780 

81957 

.82135 

82312 

82489 

82666 


64.0000 


N 2 


2.82843 


VN 


8.88819 


8.89382 

8.89944 

8.90505 

8.91067 

8.91628 

8.92188 

8.92749 

8.93308 

8.93868 


8.94427 


vTon 
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Table G. Squares and square roots {continued) 


N 


VN 

V ION 

8.00 

64.0000 

2.82843 

8.94427 

8.01 

8.02 

8.03 

64.1601 

64.3204 

64.4809 

2.83019 

2.83196 

2.83373 

8.94986 

8.95545 

8.96103 

8.04 

8.05 

8.06 

64.6416 

64.8025 

64.9636 

2.83549 

2.83725 

2.83901 

8.96660 

8.97218 

8.97775 

8.07 

8.08 

8.09 

65.1249 

65.2864 

65.4481 

2.84077 

2.84253 

2.84429 

8.98332 

8.98888 

8.99444 

8.10 

65.6100 

2.84605 

9.00000 

8.11 

8.12 

8.13 

65.7721 

65.9344 

66.0969 

2.84781 

2.84956 

2.85132 

9.00555 

9.01110 

9.01665 

8.14 

8.15 

8.16 

66.2596 

66.4225 

66.5856 

2.85307 

2.85482 

2.85657 

9.02219 

9.02774 

9.03327 

8.17 

8.18 
8.19 

66.7489 

66.9124 

67.0761 

2.85832 

2.86007 

2.86182 

9.03881 

9.04434 

9.04986 

8.20 

67.2400 

2.86356 

9.05539 

8.21 

8.22 

8.23 

67.4041 

67.5684 

67.7329 

2.86531 

2.86705 

2.86880 

9.06091 

9.06642 

9.07193 

8.24 

8.25 

8.26 

67.8976 
68.0625 
! 68.2276 

2.87054 

2.87228 

2.87402 

9.07744 

9.08295 

9.08845 

8.27 

8.28 
8.29 

68.3929 

68.5584 

68.7241 

2.87576 

2.87750 

2.87924 

9.09395 

9.09945 

9.10494 

8.30 

68.8900 

2.88097 

9.11043 

8.31 

8.32 

8.33 

69.0561 

69.2224 

69.3889 

2.88271 

2.88444 

2.88617 

9.11592 

9.12140 

9.12688 

8.34 

8.35 

8.36 

69.5556 

69.7225 

69.8896 

2.88791 

2.88964 

2.89137 

9.13236 

9.13783 

9.14330 

8.37 

8.38 

8.39 

70.0569 

70.2244 

70.3921 

2.89310 

2.89482 

2.89655 

9.14877 

9.15423 

9.15969 

8.40 

70.5600 

2.89828 

9.16 515 

8.41 

8.42 

8.43 

70.7281 

70.8964 

71.0649 

2.90000 

2.90172 

2.90345 

9.17061 

9.17606 

9.18150 

8.44 

8.45 

8.46 

71.2336 

71.4025 

71.5716 

2.90517 

2.90689 

2.90861 

9.18695 

9.19239 

9.19783 

8.47 

8.48 

8.49 

71.7409 
71.9104 
72.0801 . 

2.91033 ' 

2.91204 < 

2.91376 ! 

9.20326 

9.20869 

9.21412 

8.50 

72.2500 : 

23)1548 \ 

9.21954 

N 1 


Vn 

Vion 


N 

N 2 

Vn 

VlON 

2 

8.50 

72.2500 

2.91548 

9.21954 


8.51 

8.52 

8.53 

72.4201 

72.5904 

72.7609 

2.91719 

2.91890 

2.92062 

9.22497 

9.23038 

9.23580 


8.54 

8.55 

8.56 

72.9316 

73.1025 

73.2736 

2.92233 

2.92404 

2.92575 

9.24121 

. 9.24662 

9.25203 


8.57 

8.58 

8.59 

73.4449 

73.6164 

73.7881 

2.92746 

2.92916 

2.93087 

9.25743 

9.26283 

9.26823 


8.60 

73.9600 

2.93258 

9.27362 


8.61 

8.62 

8.63 

74.1321 

74.3044 

74.4769 

2.93428 

2.93598 

2.93769 

9.27901 

9.28440 

9 . 289?8 


8.64 

8.65 

8.66 

74.6496 

74.8225 

74.9956 

2.93939 

2.94109 

2.94279 

9.29516 

9.30054 

9.30591 


8.67 

8 .68 
8.69 

75.1689 

75.3424 

75.5161 

2.94449 

2.94618 

2.94788 

9.31128 

9,31665 

9.32202 


8.70 

75.6900 

2.94958 

9.32738 


8.71 

8.72 

8.73 

75.8641 
76.0584 
76 .2129 

2.95127 

2.95296 

2.95466 

9.33274 

9.33809 

9.34345 


8.74 

8.75 

8.76 

76.3876 

76.5625 

76.7376 

2.95635 

2.95804 

2.95973 

9.34880 

9.35414 

9.35949 


8.77 

8.78 

8.79 

76.9129 

77.0884 

77.2641 

2.96142 

2.96311 

2.96479 

9.36483 

9.37017 

9.37550 


8.80 

77.4400 

2.96648 

9.38083 


8.81 

8.82 

8.83 

77.6161 

77.7924 

77.9689 

2^96816 

2.96985 

2.97153 

9.38616 

9.39149 

9.39681 


8.84 

8.85 

8.86 

78.1456 

78.3225 

78.4996 

2.97321 

2.97489 

2.97658 

9.40213 

9.40744 

9.41276 


8.87 

8.88 
8.89 

78.6769 

78.8544 

79.0321 

2.97825 

2.97993 

2.98161 

9.41807 

9.42338 

9.42868 


8.90 

79.2100 

2.98329 

9.43398 


~ 8 . 91 " " 

8.92 

8.93 

79.3881 

79.5664 

79.7449 

2.98496 

2.98664 

2.98831 

9.43928 

9.44458 

9.44987 


8.94 

8.95 

8.96 

79.9236 

80.1025 

80.2816 

2.98998 

2.99166 

2.99333 

9.45516 

9.46044 

9.46573 


8.97 

8.98 

8.99 

80.4609 

80.6404 

80.8201 

2.99500 
2.99666 
2.99833 1 

9.47101 

9.47629 

9.48156 


9.00 

81.0000 

3.00000 ; 

9.48683 


N 

N 2 

VN 

VlON 
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Table G. Squares and square roots (. continued) 


N 

N * 

VN 

VlON 

N 

N 2 

VN 

VlON 

9.00 

81.0000 

3.00000 

9.48683 

9.50 

90.2500 

3.08221 

9.74679 

9.01 

9.02 

9.03 

81.1801 

81.3604 

81.5409 

3^00167 

3.00333 

3.00500 

9.49210 

9.49737 

9.50263 

9.51 

9.52 

9.53 

90.4401 

90.6304 

90.8209 

3.08383 

3.08545 

3.08707 

9.75192 

9.75705 

9.76217 

9.04 

9.05 

9.06 

81.7216 

81.9025 

82.0836 

3.00666 

3.00832 

3.00998 

9.50789 

9.51315 

9.51840 

9.54 

9.55 

9.56 

91.0116 

91.2025 

91.3936 

3.08869 

3.09031 

3.09192 

9.76729 

9.77241 

9.77753 

9.07 

9.08 

9.09 

82.2649 

82.4464 

82.6281 

3.01164 

3.01330 

3.01496 

9.52365 

9.52890 

9.53415 

9.57 

9.58 

9.59 

91.5849 

91.7764 

91.9681 

3.09354 

3.09516 

3.09677 

9.78264 

9.78775 

9.79285 

9.10 

82.8100 

3.01662 

9.53939 

9.60 

92.1600 

3.09839 

9.79796 

9.11 

9.12 

9.13 

82.9921 

83.1744 

83.3569 

3.01828 

3.01993 

3.02159 

9.54463 

9.54987 

9.55510 

9.61 

9.62 

9.63 

92.3521 

92.5444 

92.7369 

3.10000 

3.10161 

3.10322 

9.80306 

9.80816 

9.81326 

9.14 

9.15 

9.16 

83.5396 

83.7225 

83.9056 

3.02324 

3.02490 

3.02655 

9.56033 

9.56556 

9.57079 

9.64 

9.65 

9.66 

92.9296 

93.1225 

93.3156 

3.10483 

3.10644 

3.10805 

9.81835 

9.82344 

9.82853 

9.17 

9.18 

9.19 

84.0889 

84.2724 

84.4561 

3.02820 

3.02985 

3.03150 

9.57601 

9.58123 

9.58645 

9.67 

9.68 
9.69 

93.5089 

93.7024 

93.8961 

3.10966 

3.11127 

3.11288 

9.83362 

9.83870 

9.84378 

9.20 

84.6400 

3.03315 

9.59166 

9.70 

94.0900 

3.11448 

9.84886 

9.21 

9.22 

9.23 

84.8241 

85.0084 

85.1929 

3.03480 

3.03645 

3.03809 

9.59687 

9.60208 

9.60729 

9.71 

9.72 

9.73 

94.2841 

94.4784 

94.6729 

3.11609 

3.11769 

3.11929 

9.85393 

9.85901 

9.86408 

9.24 

9.25 

9.26 

85.3776 

85.5625 

85.7476 

3.03974 

3.04138 

3.04302 

9.61249 

9.61769 

9.62289 

9.74 

9.75 

9.76 

94.8676 

95.0625 

95.2576 

3.12090 

3.12250 

3.12410 

9.86914 

9.87421 

9.87927 

9.27 

9.28 
9.29 

85.9329 

86.1184 

86.3041 

3.04467 

3.04631 

3.04795 

9.62808 

9.63328 

9.63846 

9.77 1 

9.78 

9.79 

95.4529 

95.6484 

95.8441 

3.12570 

3.12730 

3.12890 

9.88433 

9.88939 

9.89444 

9.30 

86.4900 

3.04959 

9.64365 

9.80 

96.0400 

3.13050 

9.89949 

9.31 

9.32 

9.33 

86.6761 

86.8624 

87.0489 

3.05123 

3.05287 

3.05450 

9.64883 

9.65401 

9.65919 

9.81 

9.82 

9.83 

96.2361 

96.4324 

96.6289 

3.13209 

3.13369 

3.13528 

9.90454 

9.90959 

9.91464 

9.34 

9.35 

9.36 

87.2356 

87.4225 

87.6096 

3.05614 

3.05778 

3.05941 

9.66437 

9.66954 

9.67471 

9.84 

9.85 

9.86 

96.8256 

97.0225 

97.2196 

3.13688 

3.13847 

3.14006 

9.91968 

9.92472 

9.92975 

9.37 

9.38 

9.39 

87.7969 

87.9844 

88.1721 

3.06105 

3.06268 

3.06431 

9.67988 

9.68504 

9.69020 

9.87 

9.88 

9.89 

97.4169 

97.6144 

97.8121 

3.14166 

3.14325 

3.14484 

9.93479 

9.93982 

9.94485 

9.40 

88.3600 

3.06594 

9.69536 

9.90 

98.0100 

3.14643 

9.94987 

9.41 

9.42 

9.43 

88.5481 

88.7364 

88.9249 

3.06757 

3.06920 

3.07083 

9.70052 

9.70567 

9.71082 

9.91 

9.92 

9.93 

98.2081 

98.4064 

98.6049 

3.14802 

3.14960 

3.15119 

9.95490 

9.95992 

9.96494 

9.44 

9.45 

9.46 

89.1136 

89.3025 

89.4916 

3.07246 

3.07409 

3.07571 

9.71597 

9.72111 

9.72625 

9.94 

9.95 

9.96 

98.8036 

99.0025 

99.2016 

3.15278 

3.15436 

3.15595 

9.96995 

9.97497 

9.97998 

9.47 

9.48 

9.49 

89.6809 

89.8704 

90.0601 

3.07734 

3.07896 

3.08058 

9.73139 

1 9.73653 

; 9.74166 

9.97 

9.98 

9.99 

99.4009 

99.6004 

99.8001 

3.15753 

3.15911 

3.16070 

9.98499 
9.98999 
i 9.99500 

9.50 

90.2500 

1 3.08221 

9.74679 

10.00 

r 100.000 

r 3.16228 

i 10.0000 

N 

N J 

VN 

VlON 

N 

N 2 

VN 

VlON 
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Alienation, coefficient of, 127-128 
Analysis of variance, 252-373 
applications for significance: 

of correlation, linear, 272-275, 280 
of correlation ratio, 270-271, 280 
of differences: 

for correlated means, 294-296, 
322-323, 329, 337 
for independent means, 265-269, 
321, 337 

for trends, 347, 352-356 
of interaction, 306, 312-314, 332— 
335, 337, 340 

of multiple correlation, 281-284 
of nonlinearity, 275-278, 280 
of reliability, 297, 300 
assumptions: 

homogeneity of variances, 252, 
265, 315-317, 337-338 
independent variance estimates, 
246, 252, 256 

normality, 252, 264-265, 311, 332 
violations, effect of, 252 
classifications: 
higher, 339-340 
one-way or simple, 253, 288 
three-way or triple, 318-323 
two-way or double, 288, 290-294 
computation: 

groups of unequal size, 269-270 
simple classification, 265-267 
three-way classification, 323-329 
two-way classification, 301-307 
covariance method, 362-373 
computation, 368-370 
and correlation, 365-366 
degrees of freedom, 365, 368 


Analysis of variance, covariance 
method, multiple, 372 
regression adjustments, 366-368, 
371 

situations for use, 362, 371-372 
sum of products, 365 
degrees of freedom, 255, 293, 321— 
322 

error term for F, 309-315, 331-337, 
340 

factorial design, 340 
interaction, 290, 303 
higher, 340 

illustrations of, 307-309 
three-way, 321 
two-way, 306 

Latin square design, 341-344 
models, 262-263, 309-311, 331, 341 
fixed effects, 263, 309 
mixed, 309 
random, 262, 309 
pooling, 338-339 
preliminary tests, 338 
by ranks, 378-379 
significant F, meaning of, 264 
sum of squares, breakdown of, 253- 
255, 290-293, 322 
variance estimates, 89 
between-groups, 255 
expected value of, 257, 262, 310, 
311-314, 332-337, 342, 343 
interaction, 303 
meaning of, 256-264 
remainder, 293 
residual, 274, 293 
within-cells, 303, 330 
within-groups, 255 
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Arbitrary origin, 16 
Area sampling, 384 
Arkin, H., 12n 
Array, 110, 116 
Attenuation, 153-154, 208 
Attributes, 51 
Average, 1, 14-18 
Average deviation, 20 

Bartlett’s test, 249-250 
Best-fit line, 119-123 
Beta (/3) coefficients, 172 
Binomial distribution, 41-46 
and chi square, 210-212 
and hypothesis testing, 46-51 
kurtosis of, 43 
mean of, 43 

and normal curve, 43-46 
and probability, 42 
skewness of, 43 
standard deviation of, 43 
Biserial correlation, 189-193 
Boneau, C. A., 106 
Brinton, W. C., 12n 
Brown-Spearman formula, 150, 208, 
299-300 

Central value (tendency), 13 
mean, 16-18 
median, 14-15 
mode, 14 

Changes, evaluation of: 
for categorical data, 52-55, 224-226 
by covariance method, 373 
for graduated series, 76-77, 80-83, 
101-102, 373 
Chesire, L., 195n 
Chi square (* 2 ), 198, 209 
additive property of, 222-223 
applications as test: 
of agreement with a priori fre¬ 
quencies, 219 
of changes, 224-226 
of correlated proportions, 224-226, 
227-228 

of correlation, 219, 221 
of goodness of fit, 220, 231-235 
of group differences, 219-224, 228- 
231 


Chi square (x 2 ), applications as test: 
of independence, 219, 220-221 
assumptions, 217-219 
and binomial, 210-212 
combining of, 222-223 
continuity correction, 226-227 
degrees of freedom, 212-214, 234- 
235 

and discontinuity, 211 
distribution of, 214-217 
and F, 250 

and normal curve, 217, 250 
and null hypothesis, 216-217 
one- vs. two-tailed tests, 227 
and proportions, 224 
table of, 428-429 
and variance, 243-244 
and z, or x/a, 211, 214, 224, 225- 
226, 244, 250 
Cochran, W. G., 344 
Coded scores, 18, 22 
Colton, R. R., 12n 
Combined groups: 
mean for, 18 
standard deviation for, 24 
Common elements and correlation, 132 
Comparison of groups, 79; see also 
Significance, of differences 
Concordance coefficient, 379-381 
Confidence coefficient, 92 
Confidence interval, 89-92 
for correlation, 139 
for difference, 92-93, 104 
for mean, 92, 101 
for variance, 245-246 
Confidence level, 92 
Confidence limits, see Confidence in¬ 
terval 

Confounded, 333 
Contingency coefficient, 198-201 
Contingency table, 198, 219-220 
Continuity, correction for, 45, 51, 54, 
55, 226-227 
Continuous series, 5 
Correction: 

for attenuation, 153-154, 208 
for continuity, 45, 51, 54, 55, 226- 
227 
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Correction, for grouping, 24 

for uncontrolled variable, 362-373 
Correlation and causation, 132 
Correlation between: 

categorized variables, 193-202 
dichotomized and graduated variables, 
189-193 

dichotomized variables, 193-198, 201 
gain and initial, 158-161 
indexes, 162-163 
means, 84 

point variables, 197-198 
standard deviations, 84 
sums or averages, 206-208 
Correlation: 

factors affecting: 

errors of measurement, 153-154 
heterogeneity, 144-145 

third variable, 164-168, 366 
indexes, 162-163 
part-whole, 164 
range of talent, 144-145 
sampling errors, 137-139 
selection, 136-137 
measures of: 

biserial, 189-192 
contingency, 198-201 
correlation ratio (eta), 202-203 
278-279 

fourfold point, 197-198 
intraclass, 284-285, 299 
multiple, 169-187; see also Multi¬ 
ple correlation 
part, 167-168 
partial, 164-167 
point biserial, 192-193 
product moment, 112-135; see also 
Product moment correlation 
rank, 203-205, 379-381 
tetrachoric, 193-197 
Correlation ratio (eta), 202-203 
computation of, 278-279 
sampling significance of, 270-271, 280 
Correlations, averaging of, 140 
Covariance, 363; see also Analysis of 
variance 

Cox, G. M., 344 
Crespi, L., 230 


Critical ratio ( CR , or z), 50, 54 

and chi square, 211, 224, 225-226, 
244, 250 
and F, 250-251 
and t, 99, 102-103 
Critical region, 65 

Cumulative frequency distribution, 9 
Curvilinearity, test of, 275—278, 356— 
361 

Decile, 19 
Degrees of freedom: 

for chi square, 212-214, 234-235 
for F, 247 
for t test: 

for means, 99-101, 104 
for r, 138 

for variance estimate, 99-101 
in analysis of variance, 255, 272-273, 
281, 293, 321-322 
Deming, W. E., 384n 
Differences, see Significance, of differ¬ 
ences 

Discontinuity, see Continuity 
Discrete series, 5, 37 
Discriminant function, 205-206 
Distribution: 

binomial, 41-46 
chi square, 214-217 
cumulative, 9 
expected, 37 
F, 247 
frequency, 6 
joint, 57 

mathematical, 37 
normal, 30-35 
observed, 37 
population, 37 
sampling, 50, 74 
t, 99 

theoretical, 37 

Distribution-free methods, 374-381 
chi square as, 375 
Friedman test, 378-379 
Kendall’s W, 379-381 
Kolmogorov-Smirnov test, 235 
Kruskal-Wallis test, 378 
Mann-Whitney U test, 377-378 
“median” test, 376 
sign test, 376 
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Doolittle method, 180-184 
Duncan multiple range test, 286n 

Edwards, A. L., 339n 

Elderton’s table for chi square, 217 

Error: 

absolute, 145 
constant, 145 

in drawing conclusions, 63-69 
of estimate, 124-128, 173-175 
of measurement, 145-148, 296-299 
reduction, 84-85, 382-387 
relative, 146 

sampling, Standard error 
standard, see Standard error 
type I and type II, 64-68 
variable, 145 

Estimate, error of, 124-128, 173-175 
Estimation: 
interval, 89-93 
point, 89, 241-243 
Estimator: 
consistency, 89 
efficiency, 89 
unbiased, 89, 241-243 
Eta (t?), 202-203 
computation of, 278-279 
sampling significance of 270-271 
280 

Expected value, 241, 243, 257 
Ezekiel, M., 186 

F, or variance ratio, 247 
and chi square, 250 
degrees of freedom, 247 
distribution, 247 

error term for, 309-315, 331-337 
340 

for group variances, 248 
of independent estimates, 246-250 
and /, 251, 268, 274, 295 
table of, 431-433 
and z , or x/<r, 250-251 
Factorial design, 340 
Fiducial limits, 92 
Finite universe, 93-94 
Fisher, R. A., 64, 139, 246, 373 427- 
433 

Fitting of line, 119-123 


INDEX 

Form vs. form reliability, 151, 296-297 
314-315 

Fourfold point correlation, 197-198 
Fourfold table, 53 
and changes, 53-54, 225 
chi square for, 201 , 220 
and contingency, 198, 201 
exact probability for, 236-239 
and point correlation, 197-198 
and tetrachoric r, 193-197 
Frequency: 
as area, 8-9 

comparison, see Chi square 

cumulative, 9 

curve, 8 

distribution, 6 

polygon, 7 

table, 6 

Friedman test, 378-379 

Goodness of fit, 220 , 231-235 
Graduated series, 5 
Graphic presentation, 7-12 
histogram, 7 
line graph, 11 
ogive, 10 
polygon, 7 
Grouping, 6 
and coding, 18 
correction for, 24 
Guessed average, 17 

Heterogeneity and correlation, 144-145 
164—168, 366 
Histogram, 7 
Homoscedasticity, 124 
test of, 249-250 
Horst, P., 339n 
Hypotheses, 47, 61 
alternate, 61 
null, 52, 61 

one- vs. two-tailed, 61-63 
research, 61 
statistical, 61 

Independence, test of, 219 
Indexes: 

correlation of, 162-163 
mean of, 162 
standard deviation of, 162 
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Interaction, 290, 303, 307-309, 321, 
339-340 

and correlation, 316-317 
and group profiles, 337 
and trends, 347 
Intervals, grouping, 6 
Intraclass correlation, 284-285, 299 

Joint occurrences, 56-58 

Kelley, T. L., 180, 201 

Kendall, M. G., 204 

Kendall’s W (concordance), 379-381 

Kolmogorov-Smirnov test, 235 

Kruskal-Wallis test, 378 

Kurtosis, 13, 25, 26 

Latin square design, 341-344 
Level: 

of confidence, 92, 93 
of significance, 48, 63-69, 93 
Lewis, D., 357 
Lindquist, E. F., 252n 
Line graph, 11 
Linear component, 350 
Linearity of regression, 120, 128 
test for, 275-278 

McCall, W. A., 36 
Mann-Whitney U test, 377-378 
Matched groups by means of: 
matched distributions, 385 
paired cases, 82, 85, 385 
randomization, 386 
siblings and twins, 82, 386-387 
Mean, 16-18 

for combined groups, 18 
computation, 16-17 
sampling error of, 74-75, 98, 240- 
241, 384 

Mean difference, significance of, 76-77, 
80-83, 101-102 
Measurement: 

levels of, 374-375 
and permissible statistics, 375 
Measurement errors, 145-148, 296-299 
for change scores, 155-158 
for difference scores, 155-158 
effect on: 

comparison of means, 154-155 
correlation, 153-154 


Measurement errors, effect on: 

matching of groups, 161-162 
slopes, 155 

and regression, 158-161 
Median, 14-15 
“Median” test, 376 
Mode, 14 

Models in analysis of variance, 262-263, 
309-311, 331, 341 
Moments, 25 
Moving averages, 8 

Multiple correlation, 174-175, 178, 180 
in covariance, 372 
and determinants, 179-180 
and diminishing returns, 186 
and discriminant function, 205 
Doolittle method, 180-184 
error of estimate, 173-174 
interpretation of, 175 
limitations, 185-186 
notation, 187 

numerical solution, 180-184 
regression equations, 171-173, 177 
relative weights, 175-177 
sampling error of, 184, 281-284 
selection fallacy, 185 
and shrinkage, 184-185, 283-284 
and suppressant variable, 186-187 

Nonlinearity, test of, 275-278 
Nonparametric methods, 374-381; see 
also Distribution-free methods 
Normal correlation, 133 
Normal distribution curve, 30-35 
area under, 33 
equations for, 30 
and probability, 44-46 
table of, 33-34, 424-425 
unit form of, 30 
Norton, D. W., 252n 
Notation, 96-97, 139 
Null hypothesisr32, 61 

Ogive, 10 

One- vs, two-tailed tests, 61-63, 105 
binomial, 48-49, 56, 211 
chi square, 227, 239 
fourfold table, 237-239 
F ratio, 248-249 
Orthogonal polynomials, 347 
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Paired cases, 82, 85, 384-385 
Parameter, 2 

Part correlation, 167-168 
Partial correlation, 165-167 
sampling error of, 167 
Part-whole correlation, 164 
Paterson, D. G., 180n 
Pauli, A. E., 339 

Pearson, K., 29, 133n, 194, 217n, 239 
Percentage, see Proportion 
Percentile, 19-20 
Peters, C. C., 105 
Point biserial correlation, 192-193 
Point series (variable), 5, 37 
Polynomial forms, 361 
Power of a test, 67 
Prediction, error of, 124-128, 173-175 
Probability, 39-40 
addition theorem, 40 
approximations to, 44-46 
as area, 46 
and binomial, 41-46 
and hypothesis testing, 46-49 
of joint occurrence, 57 
as level of significance, 48 
multiplication theorem, 40 
of type I error, 64 
of type II error, 64-69 
Probable error, 96 
Product moment correlation, 112 
assumptions, 120, 124, 128-129, 131 
134-135 

computation, 112-115 
direction of, 127 
interpretations, in terms of: 
common elements, 132 
error of estimate, 124-128 
normal surface, 133 
rate of change, 124 
variance explained, 131 
limits for, 133-134, 154, 167 
sampling error of, 137-139, 272-275 
scatter diagram, 110-111, 116-119 
Profiles and interaction, 337 
Proportion, sampling error of, 50-52 
Proportions as means, 95 

Quadratic component, 357-360 
Quartile, 19 
Quartile deviation, 19 


Quota sampling, 384 

Random sampling, 51, 73, 382-383 
Randomization, 386 
Range, 6, 19 

Rank correlation, 203-205 
Kendall’s tau, 204 
Kendall’s W, 379-381 
Spearman’s rho, 203 
Ranks, mean and variance of, 376-377 
Regressed scores, 160-161 
Regression, 123 
coefficient, 123 
equations, 123, 171-173 
test of linearity, 275-278 
Relative deviate, 32 
Reliability, 145-152, 296-301 
and attenuation, 153-154, 208 
of average scores, 297-298 
of change scores, 155-158 
coefficient of, 146 
of difference scores, 155-158 
error of measurement, 147-148 
form vs. form, 151, 296-297, 314-315 
and intraclass r, 299 
range, effect on, 152 
via repeated measurements, 297-299 
significance of, 297, 300 
split-half, 150 
test-retest, 149 
Renshaw, M. J., 305 
Replication, 310 
Residuals, 130, 174, 272 

Saffir, M., 195n 
Sampling, 51, 73-74 
distribution, 50, 73-74 
binomial as, 50 
of chi square, 214-217 
empirical demonstration of, 71-73 
of F, 247 
of t, 99 

errors, reduction of, 84-85, 382-387 
for experimental and control groups 
85, 384-387 

from finite universe, 93-94 
independence of units, 93, 218-219 
size required, 68, 85-86, 108 
from skewed universe, 94-95 107 
252 
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Sampling, successive, 74 
techniques, 382-384 
area, 384 
quota, 384 

random, 51, 73, 382-383 
stratified, 383-384 
systematic, 383 
theory, 51-52, 73-75 
variance, 74 

Scales of measurement, 374-375 
Scatter diagram, 110-111, 116-119 
Scheffe, H., 286, 345 
Selected contrasts or comparisons, 285- 
287, 345 

Shrinkage of multiple r, 184-185, 283— 
284 

Siegel, S., 239 
Sign test, 376 
Significance, 48 
choice of level, 63-69 
of correlation, 137-139, 272-275, 
280 

of correlation ratio, 270-271, 280 
of curvature, 356-361 
of differences: 

for changes, 86-88, 104-105 
for correlations, 139-140 
for linear trends, 352-356 
for means: 

correlated, 80-83, 101-102, 294- 
296, 322-323, 337 
independent, 83, 102-104, 265- 
269, 321, 337 
sub- vs. total group, 94 
for proportions: 

correlated, 52-56, 224-226, 227- 
228 

independent, 56-61, 221-224, 
228-231 

for regression coefficients, 140-143 
for scores, 155, 158 
for slopes, 143, 352-356 
for standard deviations, 84, 246- 
250 

for variances: 

Bartlett’s test, 249-250 
correlated, 246 
independent, 246-250 
and erroneous conclusions, 64-68 
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Significance, of interaction, 306, 312- 
314, 332-335, 337, 340 
levels, 48, 63-69 
of linear trend, 348-352 
of mean change, 76-77, 80-82, 101- 
102 

of multiple r, 184, 281-284 
of nonlinearity, 275-278 
of quadratic trend, 356-361 
of regression coefficient, 140-143 
of reliability, 297, 300 
of skewness, 78-79 
of slope, 140-143, 348-352 
Skewness, 13, 25, 26, 27-28 
of binomial distribution, 43 
causes of, 28 
of sampling distributions: 
of correlations, 138-139 
of proportions (or percentages), 
50-51 

of standard deviations, 99 
Small sample treatment: 
of correlation, 137-138, 140 
of differences: 

for correlated means, 101-102 
for independent means, 102-104 
for variances, 246-250 
of single mean, 101 
of variance, 245-246 
see also Analysis of variance 
Smoothing, 8 
Snedecor, G. W., 247 
Spearman-Brown formula, 150, 208, 
299-300 

Split-half reliability, 150 
Spurious correlation, 163, 164 
Squares and square roots, 434-442 
Standard deviation, 20-25 
for combined groups, 24 
computation of, 20-23 
sampling error of, 78 
Sheppard’s correction, 24 
Standard error, 50, 74 
of average deviation, 78 
of correlation measures: 
biserial, 191, 193 
multiple, 184 
product moment, 137 
tetrachoric, 196-197 
z (transformed r), 139 
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Standard error, of kurtosis, 78 
of mean, 74-75, 240-241 
from finite universe, 94 
for stratified sample, 384 
of mean difference, 77, 81 
of median, 78 
of proportion, 50-51 
from finite universe, 94 
for stratified sample, 383 
of quartile deviation, 78 
of regression coefficient, 142 
of skewness, 78 
of standard deviation, 78 
Standard error of difference: 
for changes, 86-88 
for means: 
correlated, 81 
independent, 83 
sub- vs. total group, 94 
for medians, 84 
for proportions: 
correlated, 55 
independent, 60 
for scores, 155-156 
for standard deviations, 84 
for zs (transformed rs), 140 
Standard error of estimate, 124-128, 
173-175 

Standard error of measurement, 147 
Standard score, 32, 35-37 
and T score, 36-37 
Statistic, 2 

Stratified sampling, 383-384 
“Student,” 387 
Successive sampling, 74 
Sum of squares, 23, 101; see also Analy¬ 
sis of variance 

Suppressant variable, 186-187 
t ratio, 99 

assumptions and limitations, 105-108 
and confidence limits, 101, 104 
for correlation, 138 
degrees of freedom, 99-101, 104 
for difference: 

in correlated correlations, 140 
in correlated means, 101-102 
in correlated variances, 246 
in independent means, 102-104 


t ratio, distribution of, 99 
and F, 251, 268, 274, 295 
for rank correlation, 204 
for single mean, 101 
table of, 430 
and z ratio, 102-103 
T score, 36 
Tabulation, 5-7 
Taubman, R. E., 326 
Test-retest reliability, 149 
Tetrachoric correlation, 193-197 
Thurstone, L. L., 195n 
Transformation: 
mathematical, 188 
standard score, 32, 35-37 
T scaling, 36 
Trend analysis: 

curvilinear trend, 356-361 
differences in trends, 347 
individual trend (or slope), 354 
linear trend, 348 

correlated observations, 351-352 
independent observations, 348-351 
slope differences, 352 

correlated observations, 354-356 
independent samples, 143, 352-354 
trends and interaction, 347 
True score, 146 
Two-tailed tests, 61-63 

U test, 377-378 

Van Voorhis, W. R., 105 
Variance, 21 

additive nature of, 59, 129-130 
and chi square, 243-244 
computation, 20-23 
confidence limits, 245 
and correlation, 129-131, 175 
difference between, 246-250 
of differences, 59, 81, 129-130 
estimate, 89, 241-243 
homogeneity of, 249-250 
ratio, see F 

sampling distribution of, 243-244 
of sums, 59, 129-130 
theorem, 129-130 
see also Analysis of variance 


Variation, 13 

average deviation, 20 
coefficient of, 162 
quartile deviation, 19 
standard deviation, 20-25 

Walker, E*. L., 302, 307n 

Wright, Suzanne T., 265, 288 

Yates, F., 384n, 427-433 
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Yates’ correction for continuity, 226- 
227 

z as difference between standard devia¬ 
tions, 246 

z as relative deviate, 32 
z score, 32 

z transformation for r, 139 
tables of, 426-427 
z as x/<r (or x/S) ratio, 32 





